I found the issue that was tripping me with the optimization I was attempting in #1.
Basically, there seem to be cases where "op_arrays" get destroyed during middle of program execution (e.g., op_array corresponding to the live code in a file seems to get destroyed at the end of the "include" operation; and I guess even the op_arrays corresponding to EVAL operations have the same property). Since "destroy_op_array()" frees the IS_CONST operands unconditionally (i.e. with ref count checking), with the attempted optimization, one can end up with pointers into freed data. The following simple test reproduced the issue: a.php: ------ <?php $y = "hellllllloooo world"; b.php: ------ <?php include 'a.php'; print $y; During execution of live code in "a.php" -- $y will end up pointing to the IS_CONST operand of the "=" (ASSIGN) opcode in the live code op_array for "a.php". But at the end of the 'include a.php' statement in b.php, this op_array gets freed. And an attempt to print $y prints garbage. I am going to explore working around the problem by deferring cleanup of live code op_arrays to the end. [Perhaps for the case of "eval"-- I just won't enable the optimization.] But perhaps the details are not of general interest at this point... so I will refrain from additional mail on the topic. regards, Kannan Muthukkaruppan -----Original Message----- From: Kannan Muthukkaruppan Sent: Thursday, January 10, 2008 6:33 PM To: Andi Gutmans; internals@lists.php.net Subject: RE: [PHP-DEV] php-5.2.3: zend_opcode.c: pass_two() question Thanks, that helps. #1. In my case, I don't have op_arrays that are shared by multiple threads/processes. So I attempted to comment out the IS_CONST patching (is_ref=1/refcount=2) that happens in pass_two() in the hope of assigning constants by simple reference-count increments. While my basic test cases worked fine, I am running into some heap corruptions with non-trivial test cases. I haven't yet been able to come up with a smaller test case yet. I'll continue to debug this.. But it leads me to believe that there is some other reason as well (in addition to the one you point out) for doing this IS_CONST patchup. If you happen to know other reasons, it'll be very helpful. #2. For dealing with "read-only" op-arrays, perhaps one approach would be set the REFCOUNT of these IS_CONSTs to a special high value that is recognized by the run-time as being part of a read-only/immutable zval. And teach the assignment etc. operations to recognize these as special (i.e. increment/decrement on this special value should neither be done, not is it needed). #3. To give a little background of why I started looking into this: I noticed that array literals aren't getting recognized & converted into constants (IS_CONST operands) except when they occur as the RHS of a "static" declaration or the default value of an optional parameter. I prototyped some changes to the compiler so that anytime an array expression contains only literals (including maybe nested literal arrays), it gets treated as an IS_CONST operand. So for something like: $z = array(array("a", "b", "c"), array("d", "e", "f")); The old opcode sequence was: INIT_ARRAY ~4, 'a' ADD_ARRAY_ELEMENT ~4, 'b' ADD_ARRAY_ELEMENT ~4, 'c' INIT_ARRAY ~5, ~4 INIT_ARRAY ~6, 'd' ADD_ARRAY_ELEMENT ~6, 'e' ADD_ARRAY_ELEMENT ~6, 'f' ADD_ARRAY_ELEMENT ~5, ~6 ASSIGN !0, ~5 New opcode sequence is simply: ASSIGN !0, <array> While I did get some performance benefit with this simply by having less opcodes & thus less overhead in the interpreter loop, I had actually hoped to see more. Upon investigation, I realized that I hadn't really avoided the copy (due to the "is_ref=1" on the IS_CONST operand). And that's the reason for posting the question in this thread. Since PHP disallows array literals in the context of "define" & "class constants", it seems to be a common programmatic idiom to assign array literals (such as list of application error messages, or a map of product ids to name, etc.) to a global variable and refer to the array everywhere in the program using that global variable. But then, there is a per-request penalty associated with building/tearing down such globals that are really just pointing to "read-only" data. This overhead would go away if assignments from an IS_CONST operand to a target didn't perform copies. Regards, Kannan -----Original Message----- From: Andi Gutmans [mailto:[EMAIL PROTECTED] Sent: Thursday, January 10, 2008 3:46 PM To: Kannan Muthukkaruppan; internals@lists.php.net Subject: RE: [PHP-DEV] php-5.2.3: zend_opcode.c: pass_two() question Op arrays are read-only. This is a design goal and is especially important for byte code caches where several processes are looking at the same op array. Therefore setting IS_CONST to is_ref=1/refcount=2 we are ensuring that they will not be changed. The copy happens only the first time and afterwards that zval may be used, incremented, etc... Net net, it is intended behavior and very important. Andi > -----Original Message----- > From: Kannan Muthukkaruppan [mailto:[EMAIL PROTECTED] > Sent: Thursday, January 10, 2008 3:36 PM > To: internals@lists.php.net > Subject: [PHP-DEV] php-5.2.3: zend_opcode.c: pass_two() question > > An example assignment such as: > $x = 10; > doesn't seem to get handled as a simple ref-count incrementing > assignment (in zend_assign_to_variable()). Instead, $x ends up getting > a new "zval" into which a copy of the RHS (10) is made. > > This seems to be because the zval/znode corresponding to the literal 10 > is an IS_CONST operand of the assignment opcode. And, during the end > of compilation of a function, the IS_CONST operands in the function's > op_array are patched up as "references" (is_ref is set to 1). > > zend_opcode.c:pass_two() > ... > while (opline < end) { > if (opline->op1.op_type == IS_CONST) { > opline->op1.u.constant.is_ref = 1; > opline->op1.u.constant.refcount = 2; /* Make sure is_ref won't be > reset */ > } > if (opline->op2.op_type == IS_CONST) { > opline->op2.u.constant.is_ref = 1; > opline->op2.u.constant.refcount = 2; > } > ... > > And zend_assign_to_variable(), when it sees the RHS to be a reference > ("is_ref == 1") ends up doing a copy rather than a ref-count > incrementing assignment. > > It seems like performance would be better if we didn't mark constant > operands as "references" in pass_two(). But presumably it is required > to handle some cases that'll otherwise break? I am trying to > understand what those cases are. If someone can shed further light on > this issue, > it is much appreciated... > > regards, > Kannan Muthukkaruppan -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php