Re: [PHP-DEV] [RFC] token_get_all() TOKEN_AS_OBJECT mode

Nikita Popov Fri, 14 Feb 2020 02:26:12 -0800

On Fri, Feb 14, 2020 at 9:48 AM Nikita Popov <nikita....@gmail.com> wrote:


> On Thu, Feb 13, 2020 at 6:06 PM Larry Garfield <la...@garfieldtech.com>
> wrote:
>
>> On Thu, Feb 13, 2020, at 3:47 AM, Nikita Popov wrote:
>> > Hi internals,
>> >
>> > This has been discussed a while ago already, now as a proper proposal:
>> > https://wiki.php.net/rfc/token_as_object
>> >
>> > tl;dr is that it allows you to get token_get_all() output as an array of
>> > PhpToken objects. This reduces memory usage, improves performance, makes
>> > code more uniform and readable... What's not to like?
>> >
>> > An open question is whether (at least to start with) PhpToken should be
>> > just a data container, or whether we want to add some helper methods to
>> it.
>> > If this generates too much bikeshed, I'll drop methods from the
>> proposal.
>> >
>> > Regards,
>> > Nikita
>>
>> I love everything about this.
>>
>> 1) I would agree with Nicolas that a static constructor would be better.
>> I don't know about polyfilling it, but it's definitely more
>> self-descriptive.
>>
>> 2) I'm skeptical about the methods.  I can see them being useful, but
>> also being bikeshed material.  For instance, if you're doing annotation
>> parsing then docblocks are not ignorable.  They're what you're actually
>> looking for.
>>
>> Two possible additions, feel free to ignore if they're too complicated:
>>
>> 1) Should it return an array of token objects, or a lazy iterable?  If
>> I'm only interested in certain types (eg, doc strings, classes, etc.) then
>> a lazy iterable would allow me to string some filter and map operations on
>> to it and use even less memory overall, since the whole tree is not in
>> memory at once.
>>
>
> I'm going to take you up on your offer and ignore this one :P Returning
> tokens as an iterator is inefficient because it requires full lexer state
> backups and restores for each token. Could be optimized, but I wouldn't
> bother with it for this feature. I also personally have no use-case for a
> lazy token stream. (It's technically sufficient for parsing, but if you
> want to preserve formatting, you're going to be preserving all the tokens
> anyway.)
>
>
>> 2) Rather than provide bikesheddable methods, would it be feasible to
>> take a queue from PDO and let users specify a subclass of PhpToken to fetch
>> into?  That way the properties are always there, but a user can attach
>> whatever methods make sense for them.
>>
>
> It would be technically feasible. If we go with a static method for
> construction, then one might even say that there's reasonable expectation
> that PhpToken::getAll(...) is going to return PhpToken[] and
> MyPhpTokenExtension::getAll() is going to return MyPhpTokenExtension[].
>
> I'm a bit apprehensive about this though, specifically because you mention
> PDO... which, I think, isn't exactly a success story when it comes to this.
> If we do this, then the behavior would be that the object gets created, the
> properties populated, and *no constructor gets called*. The last part is
> important -- when you start calling constructors and magic methods, that's
> where the mess starts and you get PDO.
>

After thinking about this a bit more, there's a very nice solution to this:
Something that's missing from the current proposal is a constructor. Right
now, if code wants to insert new tokens, then those would have to be
constructed by creating the object and then manually assigning properties,
so we should definitely have a constructor.

Once we have one, we can mark it final, and thus make the construction
behavior well-defined, even if the class is extended.

Nikita

Re: [PHP-DEV] [RFC] token_get_all() TOKEN_AS_OBJECT mode

Reply via email to