Hello, everyone. My name is David Wang and I am one of the students
participating in Google Summer of Code this year. As you may remember,
my project is to implement a garbage collector for circular references
in PHP. As the midterm for Summer of Code is coming up, my mentor,
Derick Rethans, thought it would be a good idea if I shared my
progress with the community.

As you know, PHP uses reference counting as a simple garbage
collection system. One of the primary weaknesses of reference counting
systems is that objects that refer indirectly or directly to
themselves, i.e. reference cycles, are not collected. The accumulation
of unreferenced memory that is only deallocated at the end of a
request can be prohibitively expensive. The cycle collector I am
implementing directly addresses this problem.

The cycle collector is, of necessity, a type of tracing garbage
collector. However, it uses reference count information to accelerate
the collection process. Essentially, it works by removing the internal
reference counts (that is, references from an object within the cycle
to another object within the cycles )from candidate cycles. In a cycle
that can be collected, after all internal reference counts have been
removed, the reference count of all objects within the cycle should be
0. The particular algorithm that I am using is the synchronous cycle
collector described by David Bacon and V. T. Rajan in Concurrent Cycle
Collection in Reference Counted Systems (see the article for further
details). Various optimizations allow the cycle collector to be run
relatively infrequently and execute speedily when it does run.

Note that because PHP is designed to be single-threaded, a synchronous
algorithm is used which does pause program execution for cycle
collection when cycle collection becomes necessary. If a program has
no reference cycles, the cycle collector does not run at all.

The initial phase of implementation is complete, and I am currently
profiling, optimizing and trying the modified version of PHP on
various test programs in an effort to find bugs. Meanwhile, here are
some recent benchmarks.

Some aspects of the eZ Components (http://ez.no/ezcomponents,
available from SVN with the instructions here:
http://ez.no/community/articles/an_introduction_to_ez_components/installation)
testing suite use reference cycles heavily. The Template test uses
circular references the most frequently, the Graph test also uses
circular references, albeit less heavily.

On the Graph test, maximum memory usage with unmodified PHP was 133.9
MB with an execution time of 8 seconds.
On the Graph test, maximum memory usage with gc was 51.6 MB with an
execution time of 9 seconds.
On the Template test, maxmium memory usage with unmodified PHP was 1.5
GB with an execution time of 30 seconds.
On the Template test, maxmium memory usage with gc was 67.3 MB with an
execution time of 1 minute.
On the whole suite of tests (which includes the Graph and Template
tests), execution time with unmodified PHP was 12:03. With cycle
collection, it was 12:43.
These tests were conducted on my dual core AMD X2 4400+ desktop with
./configure --with-gd --with-jpeg-dir --with-zlib. As you can see,
there is the classic time vs. memory trade-off.

My project is currently being hosted on the xdebug CVS. You can get
the latest version with the following commands:

cvs -d :pserver:[EMAIL PROTECTED]:/repository login
CVS password: srmread
cvs -d :pserver:[EMAIL PROTECTED]:/repository co circular
Note that I'm implementing my project on top of a CVS version of PHP
5.2 that is a couple of weeks old. However, the cycle collector can be
ported into other versions of PHP fairly easily as it does not affect
the existing Zend engine code too much.

If anyone has any questions, I'll be more than happy to answer them!

Regards,

Yiduo (David) Wang

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to