On Fri, Mar 24, 2017 at 12:33 AM, Sara Golemon <poll...@php.net> wrote:
> On Thu, 23 Mar 2017 18:16:31 +0100, Nikita Popov <nikita....@gmail.com> > > I'd like to add a new TOKEN_AS_OBJECT flag to token_get_all(), which > > returns an array of PhpToken objects, rather than the mix of plain > strings > > and arrays we currently have. The PhpToken class is defined as: > > > > class PhpToken { > > public $type; > > public $text; > > public $line; > > } > > > > This has been previously suggested and implemented by Rouven Weßling [1], > > I've just ported this feature to master and optimized the implementation > > [2]. > > > > > Yeah, IIRC the more recent discussion ended at "Oh, and this! And > this! And this!" which eventually went nowhere (largely my ADD). IMO > there's no harm in adding this, and the class format seems entirely > reasonable. > > If I may bikeshed a TINY bit, I'd ask that all tokens return as > objects, rather than char|PhpToken similar to the current char|array > format we have. (Maybe that's in the PR, I haven't looked at either) > That's how it's currently implemented, yes. The output is PhpToken[]. > On Thu, Mar 23, 2017 at 5:25 PM, Jan Tvrdík <j....@centrum.cz> wrote: > > Regarding memory - would it be possible to return iterator instead of > array? > > > That has some hairy edges to it since the lexer isn't actually > reentrant within a given thread. Image the following: > > foreach (token_get_all($code, TOKEN_AS_ITERATOR) as $token) { > $moretokens = token_get_all($morecode, $whateverflags); // <--- Here > be broken dreams and crying unicorns > } > > Worse still if you have parallel iterators. > > That's probably fixable, but it's a much heavier refactor and one that > should probably have an RFC. > The lexer is reentrant as long as the lexer state is saved and restored every time the iterator is advanced -- we already need to support reentry because, for various unpleasant reasons, it is actually possible that a compilation is triggered while another is still in progress... But yes, this is definitely a much more complicated change, and one I would not consider to be particularly useful. In my experience most interesting uses of token_get_all() are not compatible with a simple iterator, or would eschew it because the iterator would in all likelihood perform worse. Nikita