Oct 18 1999.

Barry Hayes Scavenger Notes


Date: Tuesday, 28 January 1997 11:48:56
From: eliot@parcplace.com
Subject: Re: FixedSpace and Others

Hi John,

I'm forwarding your message to Barry Hayes who's our resident GC guru and the actual designer and implementor of FixedSpace. I'm sure he can help better than I can.


John --

Feel free to ask me any more you need to know ..

A scavenge happens, usually, when Eden is nearly full -- there's some buffering room -- and copies Eden and the full semi-space to the empty semi-space. The idea is that most of the objects in Eden will be garbage, so the target space of the scavenge doesn't have to be as big as the source space. Thus, source is Eden+live semi-space, target is empty semi-space.

Some volume of objects in the source semi-space is targeted for tenure to oldSpace at the start. If the guess about how much of Eden would survive is wrong and the target semi-space fills up, the scavenge continues, but the rest of the survivors go into oldSpace. The guess about how MUCH to tenure is based on how full the space is. The target of exactly which objects to tenure is based on their addresses -- objects below a certain address are tenured. [In addition, objects with large bodies are also tenured quickly -- I assume that
someone found that they have higher than usual survival times, and copying the header was a waste of effort. In addition, the scavenge may be called with the explicit GOAL of tenuring certain objects -- for example, objects that are involved in a becomes: that can't cope with newSpace objects. Real low-level technical stuff ..]

The roots of a scavenge are the RT and the stacks. Maybe a few strange globals as well, but no traversal of Old/Perm space.

Truth is, I'm a bit shakey about where the mark/sweep starts as well. But the difference between a collect and a collect-all is just that the oldRT is used is part of the root-set, and permSpace isn't examined. So anyway, the roots are (1) the oldRT, (2) the stacks. If this is a stop-the-world collection, like you get from the pulldown menu, it's all pretty easy. The world is stopped, the mark marks objects [including newSpace objects] but only frees objects in oldSpace [or old and perm for a full GC]. The objects in newSpace are dead, but the GC doesn't want to step on the scavenger's toes -- the next scavenge will free those objects.

Ah yes, it's starting to come back to me .. The incGC is tougher. While all the other threads are running, and scavenges are going, a background process is trying to find free memory by taking limited slices of time. This starts at the oldRT and stacks.

The scavenger aids the incGC, as part of its job, by making sure that when it promotes an object that is pointed to by a marked object in the RT, that object is marked. Whew. If it promotes an object that is pointed to from the RT but the old-space object isn't marked, then either the incGC hasn't marked it yet, and will now just trace through the promoted object, or the object in the RT isn't really reachable, and the promoted object will get collected in the sweep ..

[And another aside, but I hope you can handle it .. aggressively tenuring the stuff at low addresses may be the WRONG thing to do. The first part of the scavenge transitively traces the marked objects in the RT. I assume that an object in newSpace pointed to from oldSpace will probably survive. Just gut instinct. Promoting THOSE objects is good. Then it lays down objects reachable from the stacks and unmarked RT and then does the transitive closure copying thing. Promoting THOSE objects may well be bad. IE, if you're not in an incGC, the first objects that get tenured are those reachable directly from the stack, maybe the newest of the new. And they'll drag a lot of stuff over the border with them in future scavenges .. Anyway. Digression, and more than you probably
need ..]

You're right about fixed space. Objects bodies live in fixed space, not headers. The headers are in the same new/old/perm progression as any other object. When the header is collected, the body is collected.


You say:
However I thought the Eden GC was generational stop and copy to Old (this being the active semispace). The semispaces use a concurrent copying GC (Bakers?) and tenure to OldSpace. (This being a guess on my part, but I'd rather not sit in front of 300 peope and look silly).

 

No. When Eden is full, Eden and the "from" space are copied to the "to" space together, with no distinction made between Eden and fromSpace.

As to WHY it's done that way, I can be less definite. The newest thing in Eden will almost CERTAINLY be alive. You're scavenging because you just MADE it, after all. I suppose that the reason you don't just copy to fromSpace is that it's mostly full already. And besides, if you did that you'd need to find all the pointers to Eden from fromSpace, and that would be another painful RT.


I don't know what you mean by "Eden GC". There are three things that are collecting garbage. (1) The generational scavenger, which does a two-space-plus-Eden Baker-style stop-and-copy. (2) The stop-the-world mark-sweep compacting collector, which collects old or old+perm, invoked from the pulldown or in fairly nasty out-of-memory panics. (3) The incremental marking collector, (which does no compaction) and collects objects with headers in oldSpace (and bodies in old, large, fixed, and rarely permSpace).

Call these "The scavenger" "The compacting collector" and "the incremental collector".

The lifetimes of large objects are going to depend on what you are benchmarking. Well, THAT's not saying much. For example, if window bitmaps are in Smalltalk, they might have fairly long lives. That was the idea, I think, behind the way things work in VW. I suspect that Frank had his mind changed and never got around to doing anything about it in the code. He's mentioned ripping out large space, in fact.

The Scavenger does adaptive tenuring ala Dave and Frank. And as I said, it doesn't do it by AGE, but rather by location in memory which is at best random, and as I said, perhaps worse.

In the VM, the incremental GC runs until it has done a bit of stuff. This is defined as a certain about of marking or sweeping, or whatever. The primitive to the VM is "Do some smallish amount of stuff and return". I don't understand how the VI wraps this, and I don't know what kind of interrupts you are asking about.



--- you ask ---
So I assume that by scavenging from eden to the empty semispace you put objects in order of allocation since these NEW objects go at the top of the semispace and then when you copy objects from the other semispace they've survivied at least one scavenge event so if room is short you tenure stuff you can't fit to OldSpace which then possibly might have1+N scavenges?
-------------

 

The scavenge is a stop-the-world two space copy with an Eden space, not incremental or concurrent. It is designed to be fast enough to not require a long pause.

New object in eden are in creation order. But many people have pointed out that two-space copying is breadth-first traversal, and that is the order that things get into after the copy, not creation order. Trace down what happens if from-space contains just two linked lists.


Again, the objects end up in breadth-first order, but it seems as if there's another question in there as well, but I can't quite figure it out ..

Oh yes yes. THAT interrupt. Sorry -- you were clear, I was foggy. Well, as I said, the incGC does a small amount of work at each call. In theory, small enough to be within the "acceptable pause" parameters. But if there's a process with higher priority than the incGC process, it will blow off that time-slice and return, and will quickly get pre-empted. Otherwise, it would take the full slice, and THEN do the switch. This would make response to the process switch [assume it's something like an I/O request] sluggish.

Slogging through the code, it looks like the idleLoop incGC allows interrupts here the lowSpaceAction does not. So the lowSpaceAction forces the incGC to make progress even if a higher priority process becomes runnable part way through the time-slice, whereas the idleLoop incGC calling will return early and let the other process run.