On Thu, Jan 22, 2015 at 9:33 AM, Benjamin Coutu <ben.co...@zeyos.com> wrote:
> Hi, > > this post is a fork of the "[PHP-DEV] Fixing strange foreach behavior" > thread. It proposes a more efficient for-each mechanism (that does NOT > change the conceptual behaviour). > > Currently on for-each the engine will have to copy the array if that array > is visible anywhere else in the program because it will reset the internal > position pointer (which is part of the underlying hashtable structure) and > another part of the program might rely on it. > > Essentially the array gets duplicated prematurely, only because of the > internal position pointer. Of course it might have to anyways be duplicated > within the for-each loop, but if (any only if) it is actually altered. In > most cases one just iterates over without altering. Please consider the > following sample, taken from my recent post: > > $arr = $obj->arr; // property "arr" is an array > foreach ($arr as $val) ...; > > This will currently copy the array, because $arr is also visible through > $obj->arr although this is not really necessary unless the array is > actually changed during iteration. > > If one would use an external position variable that is initialized in > FE_RESET (TEMPVAR) and then incremented in FE_FETCH one could just > increment the ref_count of the array while being traversed without the > initial need to perform copy-on-write. > > Now, if the hashtable is in any way altered during the traversal then the > usual copy-on-write would kick in because for-each initialization made sure > that ref_count was incremented before starting traversal. In that case PHP > would - just like currently - have to duplicate, but only on first actual > alteration, not prematurely on for-each initialization. > > So in 90% (just a guess) of the cases, when you just traverse without > altering you get the full benefit of no-copy-necessary, while in the other > cases you will basically have the previous performance penalty of > duplication, but at least postponed to the first alteration (which might be > inside a branch that is not even taken). > > Nested for-each loops would not have to revert to copy-on-write either, > because they have their own pointer. > > This would effectively speed up most for-each operations and would have > the extra benefit of not having to store an internal pointer in the > hashtable structure. > > Please let me know your thoughts! > > Cheers, > > Ben > Doing this was the idea I had in mind as well, i.e. change the semantics of foreach to say that it will always work on a copy for by-value iteration (which ironically avoids having to actually copy it). Note that this will differ from the current behavior in a number of ways. In particular it means that changes to arrays that were references prior to iteration will not influence the iteration. The real question is what we should do in the by-reference case. Given that we need to acquire references to elements of the original array we can't reasonably work with copy-semantics (at least I don't see how). So would we just stick with the previous behavior (using the hash position hack) for that? Nikita