Hi,

this post is a fork of the "[PHP-DEV] Fixing strange foreach behavior" thread. 
It proposes a more efficient for-each mechanism (that does NOT change the 
conceptual behaviour).

Currently on for-each the engine will have to copy the array if that array is 
visible anywhere else in the program because it will reset the internal 
position pointer (which is part of the underlying hashtable structure) and 
another part of the program might rely on it.

Essentially the array gets duplicated prematurely, only because of the internal 
position pointer. Of course it might have to anyways be duplicated within the 
for-each loop, but if (any only if) it is actually altered. In most cases one 
just iterates over without altering. Please consider the following sample, 
taken from my recent post:

$arr = $obj->arr; // property "arr" is an array
foreach ($arr as $val) ...;

This will currently copy the array, because $arr is also visible through 
$obj->arr although this is not really necessary unless the array is actually 
changed during iteration.

If one would use an external position variable that is initialized in FE_RESET 
(TEMPVAR) and then incremented in FE_FETCH one could just increment the 
ref_count of the array while being traversed without the initial need to 
perform copy-on-write.

Now, if the hashtable is in any way altered during the traversal then the usual 
copy-on-write would kick in because for-each initialization made sure that 
ref_count was incremented before starting traversal. In that case PHP would - 
just like currently - have to duplicate, but only on first actual alteration, 
not prematurely on for-each initialization.

So in 90% (just a guess) of the cases, when you just traverse without altering 
you get the full benefit of no-copy-necessary, while in the other cases you 
will basically have the previous performance penalty of duplication, but at 
least postponed to the first alteration (which might be inside a branch that is 
not even taken).

Nested for-each loops would not have to revert to copy-on-write either, because 
they have their own pointer.

This would effectively speed up most for-each operations and would have the 
extra benefit of not having to store an internal pointer in the hashtable 
structure.

Please let me know your thoughts!

Cheers,

Ben

-- 

Benjamin Coutu
Zeyon Technologies Inc.
http://www.zeyos.com


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to