On Fri, Mar 24, 2017 at 12:33 AM, Sara Golemon <poll...@php.net> wrote:

> On Thu, 23 Mar 2017 18:16:31 +0100, Nikita Popov <nikita....@gmail.com>
> > I'd like to add a new TOKEN_AS_OBJECT flag to token_get_all(), which
> > returns an array of PhpToken objects, rather than the mix of plain
> strings
> > and arrays we currently have. The PhpToken class is defined as:
> >
> > class PhpToken {
> >     public $type;
> >     public $text;
> >     public $line;
> > }
> >
> > This has been previously suggested and implemented by Rouven Weßling [1],
> > I've just ported this feature to master and optimized the implementation
> > [2].
> >
> >
> Yeah, IIRC the more recent discussion ended at "Oh, and this! And
> this! And this!" which eventually went nowhere (largely my ADD).  IMO
> there's no harm in adding this, and the class format seems entirely
> reasonable.
>
> If I may bikeshed a TINY bit, I'd ask that all tokens return as
> objects, rather than char|PhpToken similar to the current char|array
> format we have. (Maybe that's in the PR, I haven't looked at either)
>

That's how it's currently implemented, yes. The output is PhpToken[].


> On Thu, Mar 23, 2017 at 5:25 PM, Jan Tvrdík <j....@centrum.cz> wrote:
> > Regarding memory - would it be possible to return iterator instead of
> array?
> >
> That has some hairy edges to it since the lexer isn't actually
> reentrant within a given thread. Image the following:
>
> foreach (token_get_all($code, TOKEN_AS_ITERATOR) as $token) {
>   $moretokens = token_get_all($morecode, $whateverflags); // <--- Here
> be broken dreams and crying unicorns
> }
>
> Worse still if you have parallel iterators.
>
> That's probably fixable, but it's a much heavier refactor and one that
> should probably have an RFC.
>

The lexer is reentrant as long as the lexer state is saved and restored
every time the iterator is advanced -- we already need to support reentry
because, for various unpleasant reasons, it is actually possible that a
compilation is triggered while another is still in progress...

But yes, this is definitely a much more complicated change, and one I would
not consider to be particularly useful. In my experience most interesting
uses of token_get_all() are not compatible with a simple iterator, or would
eschew it because the iterator would in all likelihood perform worse.

Nikita

Reply via email to