On Wed, Apr 27, 2005 at 03:43:32PM -0400, Dan Sugalski wrote:[...]At 5:40 PM +0200 4/27/05, Robin Redeker wrote: >Just for the curious me: What was the design decision behind the GC >solution? Was refcounting that bad? Refcounting gives a more global >speed hit indeed, but it's more deterministic and you wont run into >(probably) long halts during GC. (Java programs often suffer from this, >and it results in bad latency).
I'll answer this one, since I'm the one responsible for it.
Refcounting has three big issues:
1) It's expensive 2) It's error-prone 3) It misses circular garbage
The expense is non-trivial as well. Yeah, it's all little tiny bits of time, but that adds up. It's all overhead, and useless overhead for the most part.
Yes, but do we know whether refcounting is really slower than a garbage collector in the end?
Yes, we do. This is a well-researched topic and one that's been gone over pretty thouroughly for the past twenty years or so. There's a lot of literature on this -- it's worth a run through citeseer for some of the papers or, if you've a copy handy in a local library, the book "Garbage Collection" by Jones and Lins, which is a good summary of most of the techniques, their costs, drawbacks, and implementation details.
> The circular garbage thing's also an issue. Yes, there areinteresting hacks around it (python has one -- clever, but definitely a hack) that essentially involves writing a separate mark&sweep garbage collector.
I don't think circular references are used that much. This is maybe something a programmer still has to think a little bit about. And if it means, that timely destruction maybe becomes slow only for the sake of collecting circular references... don't know if thats a big feature.
Circular references are far more common than objects that truly need timely destruction, yes, and the larger your programs get the more of an issue it is. Neither are terribly common, though.
Are cicrular references such a big issue in perl5? I heard that the
buildin GC in perl5 only runs at program end and captures the circular
references, what sometimes causes a segfault (as a friend of mine experienced).
Perl 5, like all other refcounting GC systems, is essentially incremental and things get collected as the program runs. There's a final global destruction sweep that clears up anything still alive when the program exits, but this sweep doesn't guarantee ordering (and it really can't in the general case when there's circular garbage) and can in some cases cause segfaults if you've got finalizers written in C that don't properly handle last-gasp out of order cleanup.
> The thing is, there aren't any languages we care about that requiretrue instant destruction -- the ones that care (or, rather, perl. Python and Ruby don't care) only guarantee block boundary collection, and in most cases (heck, in a near-overwhelming number of cases) timely destruction is utterly irrelevant to the code being run. Yeah, you *want* timely destruction, but you neither need nor notice it in most program runs, since there's nothing that will notice.
In many programruns you wont notice the overhead of refcounting too. And in scripts, that only run up to (max) a minute, you won't even notice if the memory isn't managed at all.
We're building a general purpose engine, remember. It needs to handle programs with 10 objects that run in 10ms as well as ones with 10M objects that run for 10 weeks.
That argument actually favors non-refcount GC -- there's a minor speed win to a non-refcount system at the short end of program runs and a significant one at the large end of program runs.
And timely destruction is still a feature thats used much more than collection of circular references would be (IMHO)
In this case, YHO would turn out to be incorrect. Don't get me wrong, it's a sensible thing to think, it's just that it doesn't hold up on closer inspection.
> Having been too deep into the guts of perl, and having written moreextensions in C than I care to admit to, I wanted refcounting *dead*. It's just spread across far too much code, tracking down errors is a massive pain, and, well, yech. Yes, non-refcounting GC systems are a bit more complex, but the complexity is well-contained and manageable. (I wrote the first cut of the GC system in an afternoon sitting in my local public library) There's also the added bonus that you can swap in all sorts of different GC schemes without disturbing 99% of the code base.
Just because refcounting is error-prone it doesn't mean that a garbage
collector is better (and less error-prone).
I agree, the code is more localized. But i guess that memory leaks
(and resource leaks) that are caused by a bug in a garbage collector aren't that easy
to find and fix also.
Actually they are, significantly. Bugs in a centralized GC system show up reasonably quickly and usually very fatally -- they're pretty darn catastrophic in most cases, either causing programs to consume all memory and die, or collecting too soon and causing things to die. Indeed, if you take a look through the list archives, most of the subtle GC issues are in those places where we bypass the GC for some reason, usually in tracking external resources.
> I really need to go profile perl 5 some time to get some real stats,but I think it's likely that most programs (well, most programs I run at least) have less than 0.1% of the variables with destructors, and maybe one or two variables *total* that have a need for timely destruction. (And most of the time they get cleaned up by global destruction, which makes 'em not actually timely cleaned up)
IMHO timely destruction is not that useless. The alternative would be to free resources and close files explicit, like one has to do in Java.
And doing resource freeing explicit doesn't feel very Perl*-ish to me.
Oh, I'm not saying that automatic resource freeing's a bad idea -- it isn't. Neither am I saying that timely destruction's entirely useless -- it isn't. What I *am* saying is that a true need for timely destruction is rare when other facilities are provided, and as such it doesn't warrant being optimized for.
> >You said, that most languages will have refcount semantics, just sounds>funny for me to implement a GC then.
Actually most languages won't have refcount semantics. Perl 5's the only one that really guarantees that sort of thing now, though I think it's in there for perl 6. I doubt the python, ruby, Lisp, or Tcl compilers will emit the cleanup-at-block-boundary sweep code.
Well, python doesn't gurantee it. But i heard some people rely on that feature as the implementation does provide it.
Guido explicitly told me it's OK to break this sort of thing if I wanted. :)
But ok, as you have to manage resources manually in Lisp and Ruby anyway, the compilers won't output the boundary-sweep.
It's there for Perl6, right, and i thought that parrot is in first place there for Perl6.
Well.... no. Not for a number of years, but that's a very long story.
And there maybe will be new and other languages that rely at least on timely destruction.
At the moment there aren't, and certainly not in the group of languages that we primarily care about. Parrot has, for example, features that make languages like C significantly difficult to implement on top of parrot, but we don't really care because C isn't one of our target languages..
--
Dan
--------------------------------------it's like this------------------- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk