[PHP-DEV] ??= and function calls

Ilija Tovilo Tue, 04 Jul 2023 15:15:50 -0700

Hi everyone

I recently discovered some unfortunate behavior of the coalesce
assignment operator (??=) in combination with function calls. Here's
the TL;DR:


foo()['bar'] ??= 42;

Currently, this code calls foo() twice. This seems rather unexpected.
The technical reason as to why this happens is not straight-forward,
but I will attempt to explain below. The behavior was not specified in
the RFC (https://wiki.php.net/rfc/null_coalesce_equal_operator) and is
completely untested, and as such I don't believe it is by design. My
proposal is to change it so that foo() is only called once.

This is what is happening in detail.

??= is special in that it needs to evaluate the lhs (left hand side)
twice. At first, we need to check if the offset exists, then
conditionally execute the rhs (right hand side), re-fetch the offset
and assign the rhs value to it. The reason for the re-fetching of the
offset is that the evaluation of the rhs may invalidate the offset.
This is explained in the following blog post:
https://www.npopov.com/2017/04/14/PHP-7-Virtual-machine.html#writes-and-memory-safety
Essentially, the offset may be a pointer into an array element or
object property. If the rhs frees the array or object, or grows the
array causing a reallocation (meaning it is moved to some other place
in memory), the pointer is no longer valid. For this reason, PHP makes
sure no user code may execute between the fetching of an offset and
the assignment to it. Normally, that just means evaluating the rhs
before fetching the offset. In this case, we need to evaluate the lhs
first to know if we even should evaluate the rhs.

Naively evaluating the lhs again poses a problem for expressions with
side-effects. For example:

$array[$x++] ??= 42;

We do not want to re-evaluate the entire expression because $x++ will
lead to a different array offset the second time around. The way this
is solved is by "memoizing" any compiled expression in the lhs that is
*not* a variable, meaning not part of the offset that may be
invalidated. Internally, a variable is considered anything that may be
written to, i.e. local variables ($foo), properties ($foo->bar,
Foo::$bar), array offsets ($foo['bar']), and function calls (foo(),
$foo->bar(), Foo::bar(), $foo(), as they may return a modifiable
reference). The fact that function calls are included in that list
leads to the problem presented above. It is not actually necessary to
exclude them from memoization because their result may not be
invalidated.

Another inconsistency is that function call arguments will be
re-evaluated, but only if they are not part of some other expression.

a. foo(bar())['baz'] ??= 42;
b. foo(bar() + 0)['baz'] ??= 42;

a calls both foo() and bar() twice. b however calls foo() twice but
bar() only once. That is because the expression bar() + 0 is *not*
considered a variable and as such gets memoized.

I propose to unconditionally memoize calls (in all forms) when they
appear in the lhs of a coalesce expression. This will ensure that
calls are only executed once, including function arguments and the lhs
of method calls. Consequently, the assignment will be performed on the
same offset that was previously tested, even if the expression
contains a function call with side-effects.

The implementation for this change is simple:
https://github.com/php/php-src/pull/11592

Let me know if you have any concerns. I'm planning on merging this for
master if there is consensus on the semantics.

Ilija

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

[PHP-DEV] ??= and function calls

Reply via email to