I'm all in favour of this. It makes the language much more predictable and
amenable to optimisation and static analysis overall, at the cost of what
IMO is a very, very minor BC break.

To think that all this time, functions I've written which accept a callable
could be passed something like "extract" and depended on whatever behaviour
results. Eugh.

On Fri, Apr 29, 2016 at 7:48 PM, Nikita Popov <nikita....@gmail.com> wrote:

> Hi internals!
>
> Welcome to another edition of "crazy PHP edge-cases you don't want to know
> about".
>
> I'd like to introduce a restriction that forbids performing dynamic calls
> to scope introspection and modification functions. "Dynamic calls" here
> means things like $fn(), call_user_func($fn) and array_map($fn, ...).
> "Scope introspection functions" refers to the following functions that in
> one way or another inspect or modify parent stack frames:
>
>  * extract()
>  * compact()
>  * get_defined_vars()
>  * parse_str() with one arg
>  * mb_parse_str() with one arg
>  * func_get_args()
>  * func_get_arg()
>  * func_num_args()
>
> I'd like to introduce this restriction for a number of reasons, which will
> be outlined in the following. There are essentially two core problems: A)
> It is not clear how these functions should behave in this situation --
> indeed I will show examples of behavior differing due to inconsequential
> changes, differing between different PHP runtimes or versions and just
> generally behaving crazily. B) These calls pose a stability problem (yay
> segfaults) and violate assumptions of existing optimizations (yay more
> segfaults).
>
> A) The primary issue is that dynamic calls to these functions have unclear
> behavior and may lead to some very odd behavior. They all fundamentally
> work by inspecting higher stack frames, but don't agree on which frame they
> should operate on.
>
> Example #1:
>
> namespace {
>     function test($a, $b, $c) {
>         var_dump(call_user_func('func_num_args'));
>     }
>     test(1, 2, 3);
> }
>
> namespace Foo {
>     function test($a, $b, $c) {
>         var_dump(call_user_func('func_num_args'));
>     }
>     test(1, 2, 3);
> }
>
> This code will print int(3) int(1) on PHP 7 and HHVM (and two times int(1)
> on PHP 5). The reason is that in the non-namespaced case the number of
> arguments of the test() frame is returned, while in the namespaced case the
> number of arguments of the call_user_func() frame is returned, because of
> internal differences in stack frame management.
>
> Example #2:
>
> function test() {
>     array_map('extract', [['a' => 42]]);
>     var_dump($a);
> }
> test();
>
> This will print int(42) on PHP 5+7, but result in an undefined variable on
> HHVM. The reason is that HHVM will extract ['a' => 42] into the scope of
> the array_map() frame, rather than the test() frame. It does this because
> HHVM implements array_map as a userland (HHAS) function, rather than an
> internal function.
>
> One might write this off as a bug in the HHVM implementation, but really
> this illustrates a dichotomy between internal and userland functions with
> regard to dynamic calls of these functions. Namely, if you were to
> reimplement the array_map function in userland
>
> function array_map($fn, $array) {
>     $result = [];
>     foreach ($array as $k => $v) {
>         $result[$k] = $fn($v);
>     }
>     return $result;
> }
>
> and then try the same snippet as Example #2, it would indeed extract the
> array into the scope of array_map() and not the calling test() function. I
> hope this helps to further illustrate why calling these functions
> dynamically is a problem: They will generally be executed in a different
> scope than the one where the callback is defined. This means you can
> actually arbitrarily modify the scope of functions that accept callbacks,
> even though they were certainly not designed for this use. E.g. you can
> switch the $fn callback in the middle of the array_map execution using
> something like:
>
> array_map('extract', [['fn' => ...]]);
>
> Just imagine the possibilities of this newly discovered feature! But this
> is only where it starts. PHP has a number of magic callbacks that may be
> implicitly executed in all kinds of contexts. For example, what happens if
> we use one of these in spl_autoload_register?
>
> Example #3:
>
> spl_autoload_register('parse_str');
> function test() {
>     $FooBar = 1;
>     class_exists('FooBar');
>     var_dump($FooBar); // string(0) ""
> }
> test();
>
> Now any invocation of the autoloader (here using class_exists, but can be
> generalized to new or anything else) will create a variable for the class
> name in the local scope (with value ""). Of course there's more fun to be
> had here (tick functions!), but let's leave it at that and get to the next
> point.
>
> B) The second issue is stability. As might be expected, nobody has bothered
> testing edge-cases of dynamic calls to these functions previously. Recently
> two segfaults relating to this were found, see bug #71220 and bug #72102.
> However, these are "just bugs". The more important issue is that these
> dynamic calls to scope modifying functions go against assumptions in the
> current optimizer. For example the following very simple script currently
> crashes if opcache is enabled, because $i is incorrectly determined to be
> an integer:
>
> function test() {
>     $i = 1;
>     array_map('extract', [['i' => new stdClass]]);
>     $i += 1;
>     var_dump($i);
> }
> test();
>
> This is, of course, a bug in the optimizer and not in PHP. However, if we
> try to catch this kind of situation in the optimizer we will have to do
> very pessimistic assumptions (especially if you consider things like the
> spl_autoload_register example), for a "feature" nobody needs and that
> doesn't work particularly well anyway (see A).
>
> Due to all the aforementioned issues, I'd like to forbid these specific
> dynamic calls in PHP 7.1. I highly doubt that this will have any BC impact.
> A patch is available here: https://github.com/php/php-src/pull/1886
>
> Does anybody see a problem with this change?
>
> Thanks,
> Nikita
>

Reply via email to