On 3/24/2017 4:23 PM, Andrea Faulds wrote:
> Hi Nikita,
> 
> Nikita Popov wrote:
> 
>> I'd like to add a new TOKEN_AS_OBJECT flag to token_get_all(), which
>> returns an array of PhpToken objects, rather than the mix of plain
>> strings
>> and arrays we currently have. The PhpToken class is defined as:
>>
>> class PhpToken {
>>     public $type;
>>     public $text;
>>     public $line;
>> }
> 
> Rather than adding a flag to token_get_all() to return objects, you
> could potentially instead make an equivalent static method on PhpToken
> (PhpToken::getAll() perhaps). That would avoid mixing “object-oriented”
> and “procedural” styles, though I don't know if it matters. It seems
> cleaner to me.
> 
> Thanks!
> 

I am also against adding another flag because it violates the single
responsibility principle. However, adding a `getAll` method to the
`PhpToken` data class also seems very wrong, because it violates the
single responsibility once again.

The real and proper solution is to have an actual PHP Lexer that is
capable of creating a token stream.

    class Lexer {
        public function __construct(string $source);
        public static function fromFile(string $path): self;
        public function tokenize(): TokenStream;
    }

    final class TokenStream implements IteratorAggregate {
        public function getIterator(): Generator<Token>
        public function toArray(): Token[]
    }

    final class Token {
        // Ideally this would be an enum, but...
        public const OPEN_TAG;
        public const CLOSE_TAG;
        // ...

        // I hope these are not mutable!
        public int $category;  // type
        public string $lexeme; // text
        public int $line;
        public int $column;    // we don't have that :(

        /** @see token_name */
        public function getCategoryName(): string;

        // We could add `is*` methods for the various categories here.
        public function isOpenTag(): bool;
        public function isCloseTag(): bool;
        // ...
    }

With this in place, we're ready for the future. The `TokenStream` can
easily use a generator over the internal array. Or we offer the
`toArray` method only at the beginning. Users will have to call it, but
that overhead is tiny, considering that we can extend upon it in the
future without introducing any kind of breaking change.

Obviously this could go into the namespace `PHP\Parser`, or we prefix
everything with `PHP` (I'd be for the former).

On a side note, going for `Php` instead of `PHP` is inconsistent with
the naming that is currently dominating the PHP core:

https://secure.php.net/manual/en/indexes.functions.php

There are already some things that violate it, e.g. `Phar` instead of
`PHAR`, but most things keep acronyms intact (`DOM`, `XML`, `PDO`, ...).
Introducing even more inconsistency to PHP seems like a very bad idea to
me. I know that this is considered bikeshedding by many people, but
consistency is very important and we are already lacking it on too many
levels as it is.

-- 
Richard "Fleshgrinder" Fussenegger

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to