Hi,

> -----Original Message-----
> From: Andi Gutmans [mailto:[EMAIL PROTECTED]
> Sent: Thursday, July 12, 2007 9:06 PM
> To: David; Scott MacVicar
> Cc: Stas Malyshev; Sebastian Bergmann; internals@lists.php.net
> Subject: RE: [PHP-DEV] Mid-term Update for Cycle Collector (Google
> Summer of Code Project)
> 
> Hi David,
> 
> Isn't finding cycles an NP-complete problem? :) How can you know there
> are no cycles without actually looking for them?

That's right. =P I do traverse all objects for which it is possible to have
cycles. The time-saving observation is that a cycle that has no external
references to it can only be formed when the reference count for one of the
objects in the cycle is decremented to a non-zero value. I take those
objects and buffer them to look at later (the algorithm used to then mark
and sweep these cycles is more efficient when operated on a large buffer).
Is that what you mean?

> As for the additional if() statements you can try and use branch
> prediction hinting to the compiler in order to keep the non-gc version
> fast.

I haven't heard of that before... Is it some sort of compiler optimization
setting? Could you recommend to me some reference that would describe how
this can be done? It might help a lot.

> Btw, one of the challenges I found in the past is knowing which hash
> tables to actually traverse because some variables/hash tables in PHP
> are saved outside the symbol tables. Are you really successful in
> traversing all of them? Are you saving their addresses during
> allocation? If so, how do you know they are intact when you traverse?

Those are some tough questions! Here's what I decided to do in my
implementation: I decided to assume the cycles can only be formed with the
following types of nodes: zvals that are arrays, zvals that point to objects
that use the standard handler table, and objects in the objects store that
use the standard handler table. Of course, it's conceivable that some custom
object will form a reference cycle, but for custom objects, there's really
no good way for me to figure out what zvals they reference. I used to try to
use the get_properties handler, but for example, the SimpleXML extension
actually dynamically generates zvals on each invocation of get_properties.
On each collection cycle, I can traverse a node up to three times, and that
obviously would be pretty disgusting. I decided to just stick with standard
Zend objects.

For array nodes, I just traverse its hash table. For zvals that point to
objects, I just skip to the object it points to. For objects, I traverse its
hash table of properties.

How do I know these nodes are intact? I really don't, but I'm assuming if
they're still referenced, they are intact. Unless they are freed with a
reference count above zero (which does not appear to happen), that
assumption is true. I DO know that my starting points (the buffered nodes)
are intact because I remove them from my buffer when they are destroyed.
(Assumption here is that zval_dtor will be called on any zval before its
memory is freed).

> Do
> you start traversal only when we're supposedly in a stable state?
> (re-entrancy problems?)

A bad state to start traversal in would be one in which traversal will make
me wander into addresses that are bad. I only start collection when the user
explicitly invokes it, or when I detect a possible root. I only check for a
possible root when a reference to an object in object store is removed, or a
couple places in zend_execute, when a zval is assigned to (and overwritten)
and zval_ptr_dtor. This should catch all cycles and these places are
controlled places inside the Zend engine which are stable. 

> Also the additional size of zval can be quite an issue. In fact, with
> the whole 16bit->32bit transition we made for refcount we reduced
> performance. Maybe we can figure out a creative way to keep this down?

I've been wracking my brain for a solution to this problem. There are two
primary possibilities I can think of right now. One is to have a new struct
called zval_gc. Every allocation (and zval on the stack and in temporary
storage) will actually allocate the zval_gc which will be exactly the same
as zval, only with the two things I need tacked on the end. Then Zend will
only copy the zval part around and leave the gc part alone (that part
shouldn't be copied anyway) and speed will theoretically be the same.
However, this solution has plenty of its own complications I'm trying to
work out.

The second possibility is to stash the gc variables in the things that the
zvals point to. For arrays, it'd be the hash table. For objects, well,
that's the problem. Multiple zvals can point to the same object. Still
thinking about this one.

Ruled out possibility: Storing the gc information in the unused bits of
refcount and/or is_ref. This one SOUNDS great but for some reason, bitwise
operations are REALLY, REALLY SLOW. This still eludes me. For some reason,
(zval->is_ref > 0x80) is much faster than (zval_isref & 0x80). zval->is_ref
= 0 is also much, much faster than zval->is_ref &= 0x7f. I know the second
one requires both a memory read and write, but should it be so slow as to
more than wipe out any difference eliminating three bytes from the zval
struct made? Derick suggested this one and I still don't understand why it
didn't work. It would've been perfect. If I sound a bit frustrated here, I
am. =P

> Do you mind posting a patch against your baseline to make it easier for
> us to look at your work?

Give me a day or so... I want to work out a patch that macroizes refcount
manipulation for PHP 5.2 in CVS, then I'll post a patch that patches against
THAT.

> This can end up being a good thing if we manage to make it work well.
> The problem with this stuff though is that the last 5-10% is 95% of the
> work :)

Don't I know it!


David

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to