Hi internals!

Welcome to another edition of "crazy PHP edge-cases you don't want to know
about".

I'd like to introduce a restriction that forbids performing dynamic calls
to scope introspection and modification functions. "Dynamic calls" here
means things like $fn(), call_user_func($fn) and array_map($fn, ...).
"Scope introspection functions" refers to the following functions that in
one way or another inspect or modify parent stack frames:

 * extract()
 * compact()
 * get_defined_vars()
 * parse_str() with one arg
 * mb_parse_str() with one arg
 * func_get_args()
 * func_get_arg()
 * func_num_args()

I'd like to introduce this restriction for a number of reasons, which will
be outlined in the following. There are essentially two core problems: A)
It is not clear how these functions should behave in this situation --
indeed I will show examples of behavior differing due to inconsequential
changes, differing between different PHP runtimes or versions and just
generally behaving crazily. B) These calls pose a stability problem (yay
segfaults) and violate assumptions of existing optimizations (yay more
segfaults).

A) The primary issue is that dynamic calls to these functions have unclear
behavior and may lead to some very odd behavior. They all fundamentally
work by inspecting higher stack frames, but don't agree on which frame they
should operate on.

Example #1:

namespace {
    function test($a, $b, $c) {
        var_dump(call_user_func('func_num_args'));
    }
    test(1, 2, 3);
}

namespace Foo {
    function test($a, $b, $c) {
        var_dump(call_user_func('func_num_args'));
    }
    test(1, 2, 3);
}

This code will print int(3) int(1) on PHP 7 and HHVM (and two times int(1)
on PHP 5). The reason is that in the non-namespaced case the number of
arguments of the test() frame is returned, while in the namespaced case the
number of arguments of the call_user_func() frame is returned, because of
internal differences in stack frame management.

Example #2:

function test() {
    array_map('extract', [['a' => 42]]);
    var_dump($a);
}
test();

This will print int(42) on PHP 5+7, but result in an undefined variable on
HHVM. The reason is that HHVM will extract ['a' => 42] into the scope of
the array_map() frame, rather than the test() frame. It does this because
HHVM implements array_map as a userland (HHAS) function, rather than an
internal function.

One might write this off as a bug in the HHVM implementation, but really
this illustrates a dichotomy between internal and userland functions with
regard to dynamic calls of these functions. Namely, if you were to
reimplement the array_map function in userland

function array_map($fn, $array) {
    $result = [];
    foreach ($array as $k => $v) {
        $result[$k] = $fn($v);
    }
    return $result;
}

and then try the same snippet as Example #2, it would indeed extract the
array into the scope of array_map() and not the calling test() function. I
hope this helps to further illustrate why calling these functions
dynamically is a problem: They will generally be executed in a different
scope than the one where the callback is defined. This means you can
actually arbitrarily modify the scope of functions that accept callbacks,
even though they were certainly not designed for this use. E.g. you can
switch the $fn callback in the middle of the array_map execution using
something like:

array_map('extract', [['fn' => ...]]);

Just imagine the possibilities of this newly discovered feature! But this
is only where it starts. PHP has a number of magic callbacks that may be
implicitly executed in all kinds of contexts. For example, what happens if
we use one of these in spl_autoload_register?

Example #3:

spl_autoload_register('parse_str');
function test() {
    $FooBar = 1;
    class_exists('FooBar');
    var_dump($FooBar); // string(0) ""
}
test();

Now any invocation of the autoloader (here using class_exists, but can be
generalized to new or anything else) will create a variable for the class
name in the local scope (with value ""). Of course there's more fun to be
had here (tick functions!), but let's leave it at that and get to the next
point.

B) The second issue is stability. As might be expected, nobody has bothered
testing edge-cases of dynamic calls to these functions previously. Recently
two segfaults relating to this were found, see bug #71220 and bug #72102.
However, these are "just bugs". The more important issue is that these
dynamic calls to scope modifying functions go against assumptions in the
current optimizer. For example the following very simple script currently
crashes if opcache is enabled, because $i is incorrectly determined to be
an integer:

function test() {
    $i = 1;
    array_map('extract', [['i' => new stdClass]]);
    $i += 1;
    var_dump($i);
}
test();

This is, of course, a bug in the optimizer and not in PHP. However, if we
try to catch this kind of situation in the optimizer we will have to do
very pessimistic assumptions (especially if you consider things like the
spl_autoload_register example), for a "feature" nobody needs and that
doesn't work particularly well anyway (see A).

Due to all the aforementioned issues, I'd like to forbid these specific
dynamic calls in PHP 7.1. I highly doubt that this will have any BC impact.
A patch is available here: https://github.com/php/php-src/pull/1886

Does anybody see a problem with this change?

Thanks,
Nikita

Reply via email to