Hello, internals!

I'd like to bring this topic to the discussion again, because now we have a
shiny PHP7 engine and keep moving. PHP grammar becomes more complex and
internal AST representation isn't available for the userland developers, so
all static analysis tools suffering from the lack of native AST API for
that. Only possible way for that is to perform tokenization of source code
and then manually reconstruct an AST (thanks to the PHP-Parser and Nikita
for doing this job for us, developers)

Several days ago, Dmitry published a RFC for native attributes, which can
be a great tool for building more complex stuff on top of this metadata.
However, all attribute expressions will be stored as an AST nodes, so we
need an AST API again to analyse, parse or compile AST back into the source
code for evaling, etc.

It would be nice to push php-ast extension (or similar one) into the core,
providing an API via static class, for example "Php\Parser".

2015-03-03 19:12 GMT+03:00 Leigh <lei...@gmail.com>:

> On 3 March 2015 at 11:56, Alexander Lisachenko <lisachenko...@gmail.com>
> wrote:
> > Good morning!
> >
> > I have cleaned https://wiki.php.net/rfc/parser-extension-api and
> restricted
> > it's scope only to the parsing API. Extension API can be implemented
> later
>
> +1
>
> > on top of
> >
> https://github.com/php/php-src/commit/1010b0ea4f4b9f96ae744f04c1191ac228580e48
> > and current implementation, because it requires a lot of discussion and
> can
> > not be implemented right now.
>
> I had no idea that zend_ast_process was such a recent addition, and
> part of your proposal. I've actually started using it already
> completely independently in one of my extensions!
>
> > 1. Should each node type be represented as personal class?
> > There are two possible ways: single node class for everything (current
> > proposal) and separate class for every node kind. I have made a quick
> > research of AST API in several languages and noticed, that most of them
> > generates AST class nodes automatically. E.g. UnaryOperationNode,
> > StatementNode... This way is cool, but it requires a lot of classes to be
> > loaded and available as API. Pros: SRP of each node class, AST validators
> > (can be described with XML and DTD), more clear code analysis (checks
> with
> > $node instaceof StatementNode), typehints for concrete nodes for
> visitors.
> > However, PHP is dynamic language and we can use only one class for
> > everything, adjusting `kind` property for each node. Pros: single class,
> > easier to maintain and to use. Cons: bad UX (how to validate nodes, how
> to
> > determine number of available children, how to check is node using flags
> or
> > not, etc)
>
> I think we need to at least represent all of the current node structures.
>
> A common base class, and then classes to represent lists, zvals and
> decls that extend from this base.
>
> > 2. Where metadata should be stored (flags, names of kind nodes, relation
> > between node types)? This information will be needed later for validation
> > of AST
> >
> > Nikita have some thoughts for the future :) So he asked about the storage
> > of metadata to validate an AST and to perform some analysis on it.
> Metadata
> > should include the following: name of each node kind (this can be just a
> > class name of node or constants in the class), node restrictions (which
> > kind of node types can be used as children for concrete node; number of
> > children nodes), node flag names (it's PUBLIC, PROTECTED, PRIVATE, etc)
>
> Thinking for the future is fine, but do we need this metadata for the
> current proposal? Is the AST returned by the parser in a read-only
> state, or can users create their own nodes + children and get a pretty
> printed output? If it's the latter then we obviously need to know the
> restrictions.
>
> I think we need a mechanism that keeps names/numbers in sync
> automatically, maybe we can use some macros to automatically generate
> enums and userland facing details at the same time, so we don't have
> to keep several places in sync if/when new AST nodes are added.
>

Reply via email to