Hi, > -----Original Message----- > From: Andi Gutmans [mailto:[EMAIL PROTECTED] > Sent: Thursday, July 12, 2007 9:06 PM > To: David; Scott MacVicar > Cc: Stas Malyshev; Sebastian Bergmann; internals@lists.php.net > Subject: RE: [PHP-DEV] Mid-term Update for Cycle Collector (Google > Summer of Code Project) > > Hi David, > > Isn't finding cycles an NP-complete problem? :) How can you know there > are no cycles without actually looking for them?
That's right. =P I do traverse all objects for which it is possible to have cycles. The time-saving observation is that a cycle that has no external references to it can only be formed when the reference count for one of the objects in the cycle is decremented to a non-zero value. I take those objects and buffer them to look at later (the algorithm used to then mark and sweep these cycles is more efficient when operated on a large buffer). Is that what you mean? > As for the additional if() statements you can try and use branch > prediction hinting to the compiler in order to keep the non-gc version > fast. I haven't heard of that before... Is it some sort of compiler optimization setting? Could you recommend to me some reference that would describe how this can be done? It might help a lot. > Btw, one of the challenges I found in the past is knowing which hash > tables to actually traverse because some variables/hash tables in PHP > are saved outside the symbol tables. Are you really successful in > traversing all of them? Are you saving their addresses during > allocation? If so, how do you know they are intact when you traverse? Those are some tough questions! Here's what I decided to do in my implementation: I decided to assume the cycles can only be formed with the following types of nodes: zvals that are arrays, zvals that point to objects that use the standard handler table, and objects in the objects store that use the standard handler table. Of course, it's conceivable that some custom object will form a reference cycle, but for custom objects, there's really no good way for me to figure out what zvals they reference. I used to try to use the get_properties handler, but for example, the SimpleXML extension actually dynamically generates zvals on each invocation of get_properties. On each collection cycle, I can traverse a node up to three times, and that obviously would be pretty disgusting. I decided to just stick with standard Zend objects. For array nodes, I just traverse its hash table. For zvals that point to objects, I just skip to the object it points to. For objects, I traverse its hash table of properties. How do I know these nodes are intact? I really don't, but I'm assuming if they're still referenced, they are intact. Unless they are freed with a reference count above zero (which does not appear to happen), that assumption is true. I DO know that my starting points (the buffered nodes) are intact because I remove them from my buffer when they are destroyed. (Assumption here is that zval_dtor will be called on any zval before its memory is freed). > Do > you start traversal only when we're supposedly in a stable state? > (re-entrancy problems?) A bad state to start traversal in would be one in which traversal will make me wander into addresses that are bad. I only start collection when the user explicitly invokes it, or when I detect a possible root. I only check for a possible root when a reference to an object in object store is removed, or a couple places in zend_execute, when a zval is assigned to (and overwritten) and zval_ptr_dtor. This should catch all cycles and these places are controlled places inside the Zend engine which are stable. > Also the additional size of zval can be quite an issue. In fact, with > the whole 16bit->32bit transition we made for refcount we reduced > performance. Maybe we can figure out a creative way to keep this down? I've been wracking my brain for a solution to this problem. There are two primary possibilities I can think of right now. One is to have a new struct called zval_gc. Every allocation (and zval on the stack and in temporary storage) will actually allocate the zval_gc which will be exactly the same as zval, only with the two things I need tacked on the end. Then Zend will only copy the zval part around and leave the gc part alone (that part shouldn't be copied anyway) and speed will theoretically be the same. However, this solution has plenty of its own complications I'm trying to work out. The second possibility is to stash the gc variables in the things that the zvals point to. For arrays, it'd be the hash table. For objects, well, that's the problem. Multiple zvals can point to the same object. Still thinking about this one. Ruled out possibility: Storing the gc information in the unused bits of refcount and/or is_ref. This one SOUNDS great but for some reason, bitwise operations are REALLY, REALLY SLOW. This still eludes me. For some reason, (zval->is_ref > 0x80) is much faster than (zval_isref & 0x80). zval->is_ref = 0 is also much, much faster than zval->is_ref &= 0x7f. I know the second one requires both a memory read and write, but should it be so slow as to more than wipe out any difference eliminating three bytes from the zval struct made? Derick suggested this one and I still don't understand why it didn't work. It would've been perfect. If I sound a bit frustrated here, I am. =P > Do you mind posting a patch against your baseline to make it easier for > us to look at your work? Give me a day or so... I want to work out a patch that macroizes refcount manipulation for PHP 5.2 in CVS, then I'll post a patch that patches against THAT. > This can end up being a good thing if we manage to make it work well. > The problem with this stuff though is that the last 5-10% is 95% of the > work :) Don't I know it! David -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php