On Wed, Jul 5, 2023 at 1:15 AM Ilija Tovilo <tovilo.il...@gmail.com> wrote:

> Hi everyone
>
> I recently discovered some unfortunate behavior of the coalesce
> assignment operator (??=) in combination with function calls. Here's
> the TL;DR:
>
> foo()['bar'] ??= 42;
>
> Currently, this code calls foo() twice. This seems rather unexpected.
> The technical reason as to why this happens is not straight-forward,
> but I will attempt to explain below. The behavior was not specified in
> the RFC (https://wiki.php.net/rfc/null_coalesce_equal_operator) and is
> completely untested, and as such I don't believe it is by design. My
> proposal is to change it so that foo() is only called once.
>
> This is what is happening in detail.
>
> ??= is special in that it needs to evaluate the lhs (left hand side)
> twice. At first, we need to check if the offset exists, then
> conditionally execute the rhs (right hand side), re-fetch the offset
> and assign the rhs value to it. The reason for the re-fetching of the
> offset is that the evaluation of the rhs may invalidate the offset.
> This is explained in the following blog post:
>
> https://www.npopov.com/2017/04/14/PHP-7-Virtual-machine.html#writes-and-memory-safety
> Essentially, the offset may be a pointer into an array element or
> object property. If the rhs frees the array or object, or grows the
> array causing a reallocation (meaning it is moved to some other place
> in memory), the pointer is no longer valid. For this reason, PHP makes
> sure no user code may execute between the fetching of an offset and
> the assignment to it. Normally, that just means evaluating the rhs
> before fetching the offset. In this case, we need to evaluate the lhs
> first to know if we even should evaluate the rhs.
>
> Naively evaluating the lhs again poses a problem for expressions with
> side-effects. For example:
>
> $array[$x++] ??= 42;
>
> We do not want to re-evaluate the entire expression because $x++ will
> lead to a different array offset the second time around. The way this
> is solved is by "memoizing" any compiled expression in the lhs that is
> *not* a variable, meaning not part of the offset that may be
> invalidated. Internally, a variable is considered anything that may be
> written to, i.e. local variables ($foo), properties ($foo->bar,
> Foo::$bar), array offsets ($foo['bar']), and function calls (foo(),
> $foo->bar(), Foo::bar(), $foo(), as they may return a modifiable
> reference). The fact that function calls are included in that list
> leads to the problem presented above. It is not actually necessary to
> exclude them from memoization because their result may not be
> invalidated.
>
> Another inconsistency is that function call arguments will be
> re-evaluated, but only if they are not part of some other expression.
>
> a. foo(bar())['baz'] ??= 42;
> b. foo(bar() + 0)['baz'] ??= 42;
>
> a calls both foo() and bar() twice. b however calls foo() twice but
> bar() only once. That is because the expression bar() + 0 is *not*
> considered a variable and as such gets memoized.
>
>
This is definitely a bug in the original implementation.
In case a function is evaluated twice and returns different values, we
check one value, but assign to another.


> I propose to unconditionally memoize calls (in all forms) when they
> appear in the lhs of a coalesce expression. This will ensure that
> calls are only executed once, including function arguments and the lhs
> of method calls. Consequently, the assignment will be performed on the
> same offset that was previously tested, even if the expression
> contains a function call with side-effects.
>
> The implementation for this change is simple:
> https://github.com/php/php-src/pull/11592
>
> Let me know if you have any concerns. I'm planning on merging this for
> master if there is consensus on the semantics.
>

+1

Thanks. Dmitry.


> Ilija
>
> --
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: https://www.php.net/unsub.php
>
>

Reply via email to