On Tue, Jan 5, 2016 at 8:45 PM, Sara Golemon <poll...@php.net> wrote:
> On Tue, Jan 5, 2016 at 6:16 AM, Nikita Popov <nikita....@gmail.com> wrote: > > Would be nice if someone could come up with a more explicit name for the > > flag. TOKEN_FULL is not obvious, at least to me. TOKEN_ALWAYS_ARRAY? > > > Yeah, I'm not a huge fan of the name either, but I couldn't come up > with anything better at the time. > > Maybe TOKEN_ASSOC? Since it provides associative array elements (as > opposed to the current indexed array behavior) > I like that one. > > I'd also like to have a flag TOKEN_NO_LINENOS with deduplication of token > > arrays, but that's a separate matter... > > > Not sure what you're suggesting here. Can you elaborate? > Basically: token_get_all() is rather slow. I think it says something that getting the tokens of a script is about as slow as lexing it, parsing it into an internal AST and constructing an object-based userland AST for it. If you use token_get_all() in a matter that only requires one lookahead token at a time, you don't really care about how nice the token format is, you're only interested in it being efficient. I was hoping that we can optimize it by dropping the line numbers (which is the most volatile part of the structure) and try to reuse the same array for tokens which have the same ID and content (but likely different lineno). It's very likely that a script contains the T_WHITESPACE( ) token more than one and similarly labels and variables tend to repeat, etc. No idea if that would actually work/help, just an idea. Nikita