Hi everyone I recently discovered some unfortunate behavior of the coalesce assignment operator (??=) in combination with function calls. Here's the TL;DR:
foo()['bar'] ??= 42; Currently, this code calls foo() twice. This seems rather unexpected. The technical reason as to why this happens is not straight-forward, but I will attempt to explain below. The behavior was not specified in the RFC (https://wiki.php.net/rfc/null_coalesce_equal_operator) and is completely untested, and as such I don't believe it is by design. My proposal is to change it so that foo() is only called once. This is what is happening in detail. ??= is special in that it needs to evaluate the lhs (left hand side) twice. At first, we need to check if the offset exists, then conditionally execute the rhs (right hand side), re-fetch the offset and assign the rhs value to it. The reason for the re-fetching of the offset is that the evaluation of the rhs may invalidate the offset. This is explained in the following blog post: https://www.npopov.com/2017/04/14/PHP-7-Virtual-machine.html#writes-and-memory-safety Essentially, the offset may be a pointer into an array element or object property. If the rhs frees the array or object, or grows the array causing a reallocation (meaning it is moved to some other place in memory), the pointer is no longer valid. For this reason, PHP makes sure no user code may execute between the fetching of an offset and the assignment to it. Normally, that just means evaluating the rhs before fetching the offset. In this case, we need to evaluate the lhs first to know if we even should evaluate the rhs. Naively evaluating the lhs again poses a problem for expressions with side-effects. For example: $array[$x++] ??= 42; We do not want to re-evaluate the entire expression because $x++ will lead to a different array offset the second time around. The way this is solved is by "memoizing" any compiled expression in the lhs that is *not* a variable, meaning not part of the offset that may be invalidated. Internally, a variable is considered anything that may be written to, i.e. local variables ($foo), properties ($foo->bar, Foo::$bar), array offsets ($foo['bar']), and function calls (foo(), $foo->bar(), Foo::bar(), $foo(), as they may return a modifiable reference). The fact that function calls are included in that list leads to the problem presented above. It is not actually necessary to exclude them from memoization because their result may not be invalidated. Another inconsistency is that function call arguments will be re-evaluated, but only if they are not part of some other expression. a. foo(bar())['baz'] ??= 42; b. foo(bar() + 0)['baz'] ??= 42; a calls both foo() and bar() twice. b however calls foo() twice but bar() only once. That is because the expression bar() + 0 is *not* considered a variable and as such gets memoized. I propose to unconditionally memoize calls (in all forms) when they appear in the lhs of a coalesce expression. This will ensure that calls are only executed once, including function arguments and the lhs of method calls. Consequently, the assignment will be performed on the same offset that was previously tested, even if the expression contains a function call with side-effects. The implementation for this change is simple: https://github.com/php/php-src/pull/11592 Let me know if you have any concerns. I'm planning on merging this for master if there is consensus on the semantics. Ilija -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: https://www.php.net/unsub.php