On Wed, Jul 5, 2023 at 1:15 AM Ilija Tovilo <tovilo.il...@gmail.com> wrote:
> Hi everyone > > I recently discovered some unfortunate behavior of the coalesce > assignment operator (??=) in combination with function calls. Here's > the TL;DR: > > foo()['bar'] ??= 42; > > Currently, this code calls foo() twice. This seems rather unexpected. > The technical reason as to why this happens is not straight-forward, > but I will attempt to explain below. The behavior was not specified in > the RFC (https://wiki.php.net/rfc/null_coalesce_equal_operator) and is > completely untested, and as such I don't believe it is by design. My > proposal is to change it so that foo() is only called once. > > This is what is happening in detail. > > ??= is special in that it needs to evaluate the lhs (left hand side) > twice. At first, we need to check if the offset exists, then > conditionally execute the rhs (right hand side), re-fetch the offset > and assign the rhs value to it. The reason for the re-fetching of the > offset is that the evaluation of the rhs may invalidate the offset. > This is explained in the following blog post: > > https://www.npopov.com/2017/04/14/PHP-7-Virtual-machine.html#writes-and-memory-safety > Essentially, the offset may be a pointer into an array element or > object property. If the rhs frees the array or object, or grows the > array causing a reallocation (meaning it is moved to some other place > in memory), the pointer is no longer valid. For this reason, PHP makes > sure no user code may execute between the fetching of an offset and > the assignment to it. Normally, that just means evaluating the rhs > before fetching the offset. In this case, we need to evaluate the lhs > first to know if we even should evaluate the rhs. > > Naively evaluating the lhs again poses a problem for expressions with > side-effects. For example: > > $array[$x++] ??= 42; > > We do not want to re-evaluate the entire expression because $x++ will > lead to a different array offset the second time around. The way this > is solved is by "memoizing" any compiled expression in the lhs that is > *not* a variable, meaning not part of the offset that may be > invalidated. Internally, a variable is considered anything that may be > written to, i.e. local variables ($foo), properties ($foo->bar, > Foo::$bar), array offsets ($foo['bar']), and function calls (foo(), > $foo->bar(), Foo::bar(), $foo(), as they may return a modifiable > reference). The fact that function calls are included in that list > leads to the problem presented above. It is not actually necessary to > exclude them from memoization because their result may not be > invalidated. > > Another inconsistency is that function call arguments will be > re-evaluated, but only if they are not part of some other expression. > > a. foo(bar())['baz'] ??= 42; > b. foo(bar() + 0)['baz'] ??= 42; > > a calls both foo() and bar() twice. b however calls foo() twice but > bar() only once. That is because the expression bar() + 0 is *not* > considered a variable and as such gets memoized. > > This is definitely a bug in the original implementation. In case a function is evaluated twice and returns different values, we check one value, but assign to another. > I propose to unconditionally memoize calls (in all forms) when they > appear in the lhs of a coalesce expression. This will ensure that > calls are only executed once, including function arguments and the lhs > of method calls. Consequently, the assignment will be performed on the > same offset that was previously tested, even if the expression > contains a function call with side-effects. > > The implementation for this change is simple: > https://github.com/php/php-src/pull/11592 > > Let me know if you have any concerns. I'm planning on merging this for > master if there is consensus on the semantics. > +1 Thanks. Dmitry. > Ilija > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: https://www.php.net/unsub.php > >