Hello, everyone. My name is David Wang and I am one of the students participating in Google Summer of Code this year. As you may remember, my project is to implement a garbage collector for circular references in PHP. As the midterm for Summer of Code is coming up, my mentor, Derick Rethans, thought it would be a good idea if I shared my progress with the community.
As you know, PHP uses reference counting as a simple garbage collection system. One of the primary weaknesses of reference counting systems is that objects that refer indirectly or directly to themselves, i.e. reference cycles, are not collected. The accumulation of unreferenced memory that is only deallocated at the end of a request can be prohibitively expensive. The cycle collector I am implementing directly addresses this problem. The cycle collector is, of necessity, a type of tracing garbage collector. However, it uses reference count information to accelerate the collection process. Essentially, it works by removing the internal reference counts (that is, references from an object within the cycle to another object within the cycles )from candidate cycles. In a cycle that can be collected, after all internal reference counts have been removed, the reference count of all objects within the cycle should be 0. The particular algorithm that I am using is the synchronous cycle collector described by David Bacon and V. T. Rajan in Concurrent Cycle Collection in Reference Counted Systems (see the article for further details). Various optimizations allow the cycle collector to be run relatively infrequently and execute speedily when it does run. Note that because PHP is designed to be single-threaded, a synchronous algorithm is used which does pause program execution for cycle collection when cycle collection becomes necessary. If a program has no reference cycles, the cycle collector does not run at all. The initial phase of implementation is complete, and I am currently profiling, optimizing and trying the modified version of PHP on various test programs in an effort to find bugs. Meanwhile, here are some recent benchmarks. Some aspects of the eZ Components (http://ez.no/ezcomponents, available from SVN with the instructions here: http://ez.no/community/articles/an_introduction_to_ez_components/installation) testing suite use reference cycles heavily. The Template test uses circular references the most frequently, the Graph test also uses circular references, albeit less heavily. On the Graph test, maximum memory usage with unmodified PHP was 133.9 MB with an execution time of 8 seconds. On the Graph test, maximum memory usage with gc was 51.6 MB with an execution time of 9 seconds. On the Template test, maxmium memory usage with unmodified PHP was 1.5 GB with an execution time of 30 seconds. On the Template test, maxmium memory usage with gc was 67.3 MB with an execution time of 1 minute. On the whole suite of tests (which includes the Graph and Template tests), execution time with unmodified PHP was 12:03. With cycle collection, it was 12:43. These tests were conducted on my dual core AMD X2 4400+ desktop with ./configure --with-gd --with-jpeg-dir --with-zlib. As you can see, there is the classic time vs. memory trade-off. My project is currently being hosted on the xdebug CVS. You can get the latest version with the following commands: cvs -d :pserver:[EMAIL PROTECTED]:/repository login CVS password: srmread cvs -d :pserver:[EMAIL PROTECTED]:/repository co circular Note that I'm implementing my project on top of a CVS version of PHP 5.2 that is a couple of weeks old. However, the cycle collector can be ported into other versions of PHP fairly easily as it does not affect the existing Zend engine code too much. If anyone has any questions, I'll be more than happy to answer them! Regards, Yiduo (David) Wang -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php