On Thu, Apr 26, 2012 at 3:45 AM, C.Koy <can5...@gmail.com> wrote:

> As of 5.3.0 this bug does not exist for function names. Only classes and
> interfaces.
>
>
Turns out, if you cause a function to be called dynamically by (e.g.) using
a variable function, the bug will surface.

    <?php
    setlocale(LC_CTYPE, 'tr_TR');
    function IJK() {}
    # succeeds
    IJK();
    $f = 'IJK';
    # causes Fatal error: Call to undefined function IJK()
    $f();

In contrast, if you set the locale for LC_CTYPE on the command line, the
bug doesn't arise at all because the compilation and execution phases both
use the same locale.



> Could this be a clue for how to fix it for those as well?


Function names are generally resolved at compile time (dynamic function
names are resolved at run time, which is why the bug surfaces for them),
before the call to setlocale in the script has been executed. Class name
resolution is put off until execution time for autoloading and possibly
other purposes. Converting class names to lowercase at compile time may
work. A quick glance at the source shows that class_name,
fully_qualified_class_name and class_name_reference all depend on
namespace_name, which is the rule that is responsible for the parsing of
the class name.

    namespace_name:
     T_STRING { $$ = $1; }
     | namespace_name T_NS_SEPARATOR T_STRING {
zend_do_build_namespace_name(&$$, &$1, &$3 TSRMLS_CC); }
    ;

However, static_scalar is also dependent on namespace_name, and I don't
believe that symbol should be made case-insensitive. Creating an additional
symbol for case-independency would allow a more targeted approach. The
various class symbols would then rely on this new symbol, rather than
namespace_name.

    lc_namespace_name:
T_STRING { zend_str_tolower($1); $$ = $1; }
 | lc_namespace_name T_NS_SEPARATOR T_STRING { zend_str_tolower($3);
zend_do_build_namespace_name(&$$, &$1, &$3 TSRMLS_CC); }
    ;

Converting class names to lower case early may have additional
consequences. It may affect class names in error messages, for example (I
didn't dig deep enough to determine this). __CLASS__ should be unaffected
(when defining a class, the class name is parsed as a T_STRING; the value
for __CLASS__ comes from this symbol). It also won't resolve the bug for
dynamic names. I suspect that altering variable_class_name and
dynamic_class_name_reference in a manner described previously (use a custom
lowercase conversion or temporarily switch locale) to convert the name
would resolve the bug in the dynamic case for class names. Changing a
number of the production rules for function_call in a similar manner should
resolve the bug for dynamic function call. Again, there will likely be
unintended consequences. Alternatively, updating
zend_do_begin_dynamic_function_call() and zend_do_fetch_class() to use
custom conversion should resolve the bug in the dynamic case.

I like the idea of using the system default locale for name conversion
(making name resolution independent of the current locale), but am
concerned that it will make name lookup slow. Instead, a second set of
locale-independent, unicode-aware conversion functions (basically, iliaa's
original solution, but Unicode compatible) to be used for identifiers would
make name resolution independent of the current locale. Any time an
identifiers needs to be converted, it would use one of these functions. As
a run-time optimization, non-dynamic class names could use the system
locale conversion, but that would be a separate thing from resolving this
bug.

Reply via email to