On Fri, Feb 20, 2015 at 8:29 AM, Marcio Almada <marcio.w...@gmail.com> wrote:
> Hi internals, > > I'd like to put the "Context Sensitive Lexer" RFC into discussion phase: > > RFC: https://wiki.php.net/rfc/context_sensitive_lexer > TL;DR commit: https://github.com/marcioAlmada/php-src/commit/c01014f9 > PR: https://github.com/php/php-src/pull/1054 > > PHP currently has ~64 globally reserved words. Not infrequently, these > reserved words end up clashing with legit alternatives to userland API > declarations. This RFC proposes minimal changes to have a context sensitive > lexer with support for semi-reserved words on PHP7 without causing > maintenance issues. > > This could be especially useful to: > > - Reduce the surface of BC breaks whenever new keywords are introduced > - Avoid restricting userland APIs. Dispensing the need for hacks like > unecessary magic method calls or prefixed identifiers. > > The patch is 98% finished, the entire test suite is passing. I'm still > adding more tests to it but the hard part is done. So it's time to discuss! > Sincerely, > Márcio Almada > I think we all agree that it would be nice to not be so strict about reserved keywords in some places. As such this RFC hinges on questions of implementation. The RFC uses a purely lexer-based approach, which is nice in principle, because ext/tokenizer benefits from it as well. The disadvantage of doing this in the lexer and in the scope that you're proposing (i.e. including class names) is that it requires reimplementing quite a number of parser rules via lookahead in the lexer. This means that a) the implementation depends on a complete understanding of the PHP syntax, otherwise we'll miss edge cases or be too strict in others and b) may limit us in future, because we may not be able to introduce syntax that can't be reasonably recognized with simple lexer state management or lookahead. To give you an example of a), your patch currently handles a single interface name properly nikic@saturn:~/php-src$ sapi/cli/php -r 'class Foo implements Interface {}' Fatal error: Interface 'Interface' not found in Command line code on line 1 but fails as soon as you implement multiple interfaces: nikic@saturn:~/php-src$ sapi/cli/php -r 'class Foo implements Interface, Array {}' Parse error: syntax error, unexpected 'Array' (T_ARRAY), expecting identifier (T_STRING) or namespace (T_NAMESPACE) or \\ (T_NS_SEPARATOR) in Command line code on line 1 So, I'm sure this can be worked around with a couple of new lexer rules, I'm just trying to show the systematic issues of this approach. An example for b) is harder to come by (as I'm not terribly familiar with what we can easily detect in the lexer and what we can't). One thing that comes to mind is supporting a short lambda syntax like the one available in Hack: (ClassName $a, $b, $c, $d) ==> $a As this has no prefixing "function" or similar, I suspect that it may be rather hard to detect that "ClassName" is actually a class name here and requires special treatment. Un-reserving class names now may make features like this impossible (or unnecessarily hard) to implement in the future. Due to these issues, I don't like the RFC in the current form - I think it's too ambitious. Class names simply occur in too many and diverse places. I would suggest going with a more limited approach instead, which targets only method and class constant names. I.e. the label after -> and :: should not be reserved (we already do this for ->) and the label after "function" and "const" shouldn't be either. Of course this would also allow defining global reserved-keyword function/const names as well, so we might want to check their names against the list of reserved keywords. Though even that is just a courtesy to the user, e.g. it's already possible to define and access reserved-keyword constants using define() and constant(). Nikita