I’m planning on adding this functionality in some form to HHVM, however if it’s also wanted in PHP, I’d rather not add something HHVM-specific and will be happy to put up RFCs :)
Location Information ———— token_get_all() returns a line number for some tokens. I propose adding an additional TOKEN_EXTENDED_LOCATION flag, that would include: - starting line and character number within that line - ending line and character number within that line T_ENCAPSED_AND_WHITESPACE and T_INLINE_HTML seem to be the most common cases of start line !== end line. Raw Tokens ———— While token_get_all() is documented as returning whatever the lexer sees, in practice third-party software frequently depends on specific output. This gives you 3 options: 1. limit changes you make to the lexer to preserve BC 2. lie about the tokens to preserve BC 3. break BC In our experience, #3 is not practical and #1 can lead to much more complicated solutions for problems that would be easily fixable in the lexer - so we went for #2. For example, HHVM converts: - T_HASHBANG to T_INLINE_HTML - T_ELSEIF to T_ELSE T_WHITESPACE T_IF However, this means that there’s not currently a way to get the real lexer tokens. I propose adding a TOKEN_RAW flag, which should explicitly allow implementation-specific tokens and no guarantees about output stability. For now, this would be a no-op in PHP, however it would give you more freedom in modifying the lexer in the future (in combination with #2 if the flag isn’t specified). With thanks, - Fred -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php