On Fri, 27 Aug 2010 18:06:19 -0700, Paul Rubin wrote: > Steven D'Aprano <st...@remove-this-cybersource.com.au> writes: >> I've repeatedly asked, both here and elsewhere, why reference counting >> isn't "real" garbage collection. Nobody has been able to give me a >> satisfactory answer. As far as I can tell, it's a bit of >> pretentiousness with no basis in objective fact. > > Well, it's a bit of a subjective matter. I'd say it's not real gc > because 1) it's unsound (misses reference cycles),
You can add cycle detection to a reference count gc, at the cost of more complexity. If you read the Wikipedia article I linked to, tracing algorithms can also be unsound: Some collectors running in a particular environment can correctly identify all pointers (references) in an object; these are called "precise" (also "exact" or "accurate") collectors, the opposite being a "conservative" or "partly conservative" collector. Conservative collectors have to assume that any bit pattern in memory could be a pointer if (when interpreted as a pointer) it would point into any allocated object. Thus, conservative collectors may have some false negatives, where storage is not released because of accidental fake pointers... http://en.wikipedia.org/wiki/Garbage_collection_(computer_science) > and 2) it requires > constant attention from the mutator to incr and decr the reference > counts. Yes. And? > So developing modules for the CPython API means endlessly > finding and fixing refcount bugs that lead to either crashes/security > failures, or memory leaks. If you program the Java JNI or a typical > Lisp FFI, you'll find that real gc is a lot simpler to use since you > avoid all the refcount maintenance hassles. You allocate memory and > shut your eyes, and the gc takes care of freeing it when it figures out > that you are done. Refcounting is basically a form of manual memory > management, while gc is automatic. That's a problem with the CPython API, not reference counting. The problem is that the CPython API is written at too low a level, beneath that at which the garbage collector exists, so naturally you have to manually manage memory. > Someone said here recently that as a program gets larger, saying "this > will work as long as we do X every time without fail" becomes equal to > saying "this won't work". Substitute "properly maintain all ref counts" > for X and you can see the problem. I've seen released "production" > "tested" Python C modules with subtle refcount bugs on more than one > occasion. In gc'd systems there are fewer places for the code to go > wrong. On the other hand, tracing gcs have their own set of problems too, mostly due to the use of finalizers and attempts to make garbage collection run more predictably. See here: http://publib.boulder.ibm.com/infocenter/javasdk/v1r4m2/topic/com.ibm.java.doc.diagnostics.142j9/html/coexistwithgc.html Quote: For tidying Java resources, think about the use of a clean up routine. When you have finished with an object, call the routine to null out all references, deregister listeners, clear out hash tables, and so on. This is far more efficient than using a finalizer and has the useful side-benefit of speeding up garbage collection. The Garbage Collector does not have so many object references to chase in the next garbage collection cycle. Translated: "Rather than relying on the garbage collector to clean up resources after you, do it yourself, manually, so the garbage collector has less work to do." Tracing garbage collectors aren't a panacea. They're software themselves, and complex software, which means they're subject to bugs like the one which plagued Flash plugin 9: http://gskinner.com/blog/archives/2008/04/failure_to_unlo.html The more complicated the garbage collector, the more scope you have for some interaction between your high-level code and the gc leading to memory not be reclaimed or extreme slowdown. Like this: http://tech.puredanger.com/2009/02/11/linkedblockingqueue-garbagecollection/ -- Steven -- http://mail.python.org/mailman/listinfo/python-list