I'm all in favour of this. It makes the language much more predictable and amenable to optimisation and static analysis overall, at the cost of what IMO is a very, very minor BC break.
To think that all this time, functions I've written which accept a callable could be passed something like "extract" and depended on whatever behaviour results. Eugh. On Fri, Apr 29, 2016 at 7:48 PM, Nikita Popov <nikita....@gmail.com> wrote: > Hi internals! > > Welcome to another edition of "crazy PHP edge-cases you don't want to know > about". > > I'd like to introduce a restriction that forbids performing dynamic calls > to scope introspection and modification functions. "Dynamic calls" here > means things like $fn(), call_user_func($fn) and array_map($fn, ...). > "Scope introspection functions" refers to the following functions that in > one way or another inspect or modify parent stack frames: > > * extract() > * compact() > * get_defined_vars() > * parse_str() with one arg > * mb_parse_str() with one arg > * func_get_args() > * func_get_arg() > * func_num_args() > > I'd like to introduce this restriction for a number of reasons, which will > be outlined in the following. There are essentially two core problems: A) > It is not clear how these functions should behave in this situation -- > indeed I will show examples of behavior differing due to inconsequential > changes, differing between different PHP runtimes or versions and just > generally behaving crazily. B) These calls pose a stability problem (yay > segfaults) and violate assumptions of existing optimizations (yay more > segfaults). > > A) The primary issue is that dynamic calls to these functions have unclear > behavior and may lead to some very odd behavior. They all fundamentally > work by inspecting higher stack frames, but don't agree on which frame they > should operate on. > > Example #1: > > namespace { > function test($a, $b, $c) { > var_dump(call_user_func('func_num_args')); > } > test(1, 2, 3); > } > > namespace Foo { > function test($a, $b, $c) { > var_dump(call_user_func('func_num_args')); > } > test(1, 2, 3); > } > > This code will print int(3) int(1) on PHP 7 and HHVM (and two times int(1) > on PHP 5). The reason is that in the non-namespaced case the number of > arguments of the test() frame is returned, while in the namespaced case the > number of arguments of the call_user_func() frame is returned, because of > internal differences in stack frame management. > > Example #2: > > function test() { > array_map('extract', [['a' => 42]]); > var_dump($a); > } > test(); > > This will print int(42) on PHP 5+7, but result in an undefined variable on > HHVM. The reason is that HHVM will extract ['a' => 42] into the scope of > the array_map() frame, rather than the test() frame. It does this because > HHVM implements array_map as a userland (HHAS) function, rather than an > internal function. > > One might write this off as a bug in the HHVM implementation, but really > this illustrates a dichotomy between internal and userland functions with > regard to dynamic calls of these functions. Namely, if you were to > reimplement the array_map function in userland > > function array_map($fn, $array) { > $result = []; > foreach ($array as $k => $v) { > $result[$k] = $fn($v); > } > return $result; > } > > and then try the same snippet as Example #2, it would indeed extract the > array into the scope of array_map() and not the calling test() function. I > hope this helps to further illustrate why calling these functions > dynamically is a problem: They will generally be executed in a different > scope than the one where the callback is defined. This means you can > actually arbitrarily modify the scope of functions that accept callbacks, > even though they were certainly not designed for this use. E.g. you can > switch the $fn callback in the middle of the array_map execution using > something like: > > array_map('extract', [['fn' => ...]]); > > Just imagine the possibilities of this newly discovered feature! But this > is only where it starts. PHP has a number of magic callbacks that may be > implicitly executed in all kinds of contexts. For example, what happens if > we use one of these in spl_autoload_register? > > Example #3: > > spl_autoload_register('parse_str'); > function test() { > $FooBar = 1; > class_exists('FooBar'); > var_dump($FooBar); // string(0) "" > } > test(); > > Now any invocation of the autoloader (here using class_exists, but can be > generalized to new or anything else) will create a variable for the class > name in the local scope (with value ""). Of course there's more fun to be > had here (tick functions!), but let's leave it at that and get to the next > point. > > B) The second issue is stability. As might be expected, nobody has bothered > testing edge-cases of dynamic calls to these functions previously. Recently > two segfaults relating to this were found, see bug #71220 and bug #72102. > However, these are "just bugs". The more important issue is that these > dynamic calls to scope modifying functions go against assumptions in the > current optimizer. For example the following very simple script currently > crashes if opcache is enabled, because $i is incorrectly determined to be > an integer: > > function test() { > $i = 1; > array_map('extract', [['i' => new stdClass]]); > $i += 1; > var_dump($i); > } > test(); > > This is, of course, a bug in the optimizer and not in PHP. However, if we > try to catch this kind of situation in the optimizer we will have to do > very pessimistic assumptions (especially if you consider things like the > spl_autoload_register example), for a "feature" nobody needs and that > doesn't work particularly well anyway (see A). > > Due to all the aforementioned issues, I'd like to forbid these specific > dynamic calls in PHP 7.1. I highly doubt that this will have any BC impact. > A patch is available here: https://github.com/php/php-src/pull/1886 > > Does anybody see a problem with this change? > > Thanks, > Nikita >