Re: [PHP-DEV] [RFC] Normalize token_get_all() output (with flag)

Nikita Popov Wed, 06 Jan 2016 01:44:37 -0800

On Tue, Jan 5, 2016 at 8:45 PM, Sara Golemon <poll...@php.net> wrote:

> On Tue, Jan 5, 2016 at 6:16 AM, Nikita Popov <nikita....@gmail.com> wrote:
> > Would be nice if someone could come up with a more explicit name for the
> > flag. TOKEN_FULL is not obvious, at least to me. TOKEN_ALWAYS_ARRAY?
> >
> Yeah, I'm not a huge fan of the name either, but I couldn't come up
> with anything better at the time.
>
> Maybe TOKEN_ASSOC? Since it provides associative array elements (as
> opposed to the current indexed array behavior)
>

I like that one.

> > I'd also like to have a flag TOKEN_NO_LINENOS with deduplication of token
> > arrays, but that's a separate matter...
> >
> Not sure what you're suggesting here.  Can you elaborate?
>

Basically: token_get_all() is rather slow. I think it says something that
getting the tokens of a script is about as slow as lexing it, parsing it
into an internal AST and constructing an object-based userland AST for it.
If you use token_get_all() in a matter that only requires one lookahead
token at a time, you don't really care about how nice the token format is,
you're only interested in it being efficient. I was hoping that we can
optimize it by dropping the line numbers (which is the most volatile part
of the structure) and try to reuse the same array for tokens which have the
same ID and content (but likely different lineno). It's very likely that a
script contains the T_WHITESPACE( ) token more than one and similarly
labels and variables tend to repeat, etc. No idea if that would actually
work/help, just an idea.

Nikita

Re: [PHP-DEV] [RFC] Normalize token_get_all() output (with flag)

Reply via email to