On Mon, Apr 20, 2015 at 5:32 PM, Marcio Almada <marcio.w...@gmail.com> wrote:
> Hi, > > The Context Sensitive Lexer RFC > <https://wiki.php.net/rfc/context_sensitive_lexer> passed :) and by the > time of the voting phase, we decided to vote for the feature only and later > discuss quality analysis on the implementations aimed to fulfill the RFC. > > First, I'd like to thank you all for that decision. I know that's an > exception on the RFC process, and I am glad we choose this path for the > following reasons: > > 1. Voting different RFCs describing the same feature with slightly > different implementations would cause us to waste many voting cycles (maybe > entire release cycles) without a guarantee of quality. The main reason to > establish an RFC process is to chase for quality - to follow all the rules > strictly, in this case, would be to contradict our main objective here. > > 2. Knowing in advance that the feature was already approved is a motivating > factor to go on and try a good number of possible implementations and > propose the best ones, instead of recursively voting until an > implementation pass. > > With that said, this is the proposed pull request: > > Pull Request: https://github.com/php/php-src/pull/1221 > Diff: https://github.com/php/php-src/pull/1221/files > RFC: https://wiki.php.net/rfc/context_sensitive_lexer > > There is sufficient description of the pull request itself. The ones that > participated in the previous discussions probably won't have trouble to > understand it, but feel free to share any doubts or suggestions here, if > necessary. > > Thanks, > Márcio > Sorry for late response, forgot about this RFC. I've only glanced over it, but the patch looks okay from the technical side. The thing that's bothering me is the fact that this patch is basically saying: "It is no longer possible to correctly tokenize PHP without also parsing it." For example, if you're writing a syntax highlighter for PHP and you want that syntax highlighter to be correct, you'll be writing not only a lexer, but also a parser for PHP (which is significantly more complicated). Actually it's worse than that: The approach of running a parser concurrently with the token collection does not work for highlighting code snippets for example, where the snippet may not form syntactically fully valid code. Syntax highlighting only being an example, this applies to any external tooling that's not written in PHP and does not have the benefit of using token_get_all(). I don't know how important this is to us, but I'm somewhat vary of going more into the C++ direction (where you essentially need a full type-analyzer to do a parse). This is why I still prefer the dead-simple approach of making the next label after :: and "function" unreserved (what we do for the label after -> already), combined with forbidding reserved names for free functions in the compiler (similar to the blacklist we have for classes). This doesn't cover everything (like trait adaptations), but I think it covers the 97% case (and actually allows us to really allow all names, without exceptions). Nikita