Garbage collection (was Re: JWZ on s/Java/Perl/)

Dan Sugalski Fri, 09 Feb 2001 09:37:08 -0800
At 12:06 PM 2/9/2001 -0500, Ken Fox wrote:
>Branden wrote:
> > I actually don't understand how traversing a graph can be faster than
> > incrementing/decrementing/testing for zero on a refcount.
>
>There are two main reasons advanced garbage collectors are fast:
>
>  1. Cheap allocations. Most fast collectors have a one or two
>     instruction malloc. In C it looks like this:
>
>       void *malloc(size) { void *obj = heap; heap += size; return obj; }
>
>     It's easier to do alignments in a macro layer above the allocator
>     so the allocator doesn't have to constantly re-align to address
>     boundaries. There is basically no difference between the performance
>     of heap and stack allocations with a good collector.

This is definitely very true. It cuts out the overhead of free as well, 
since you don't have to free any data (perl pays this with realloc a lot, 
since realloc's a malloc, copy, and free). Plus there's no need to mess 
with any sort of 'allocated memory' list, which malloc and free currently 
need to keep so they don't leak memory.

>  2. Work proportional to live data, not total data. This is hard to
>     believe for a C programmer, but good garbage collectors don't have
>     to "free" every allocation -- they just have to preserve the live,
>     or reachable, data. Some researchers have estimated that 90% or
>     more of all allocated data dies (becomes unreachable) before the
>     next collection. A ref count system has to work on every object,
>     but smarter collectors only work on 10% of the objects.

As is this. (Perl can generate a lot of garbage if you're messing around 
with strings and arrays a lot)

Also, one thing people forget is that manipulating reference counts can get 
expensive. It doesn't seem like much--an integer increment or decrement 
here or there. No big deal, right? Well, that cost tends to add up after a 
while. Its paid in lots of tiny little pieces rather than in a few big 
chunks, but the total time taken by it is larger.

It's also possible that by tossing refcounts we can shrink down the size of 
a perl variable structure (though I know it's not that way now) or at least 
move the GC field to the end, where it's less likely to be loaded. Most 
fast processors these days fetch data into cache in 8 or 16 byte chunks, so 
moving the GC field outside of the active chunk area means we won't be 
loading in dead data (okay, it's only resting!) every time we access a 
variable. There's no point in doing this with perl 5, since it's not dead 
data, but with a non-refcount GC scheme it'll be accessed much less.

Finally, all you really need to do is read the last day or so of p5p where 
Alan's trying to plug a batch of perl memory leaks to see how well the 
refcount scheme seems to be working now...

                                        Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk
Garbage collection (was Re: JWZ on s/Java/Perl/)

Reply via email to