[PATCH] D130626: [pseudo] experiment

Haojian Wu via Phabricator via cfe-commits Wed, 27 Jul 2022 15:07:02 -0700

hokein added a comment.

The solution in imperfect here (it is based on the identifier content, so it 
won't work if the code happen to have two same-content-but-different-kind 
tokens, e.g. `trace::Span Span;`), but it at least shows that ambiguities in 
identifiers are the most critical one.


And it gives some evidences that if we know all type information of 
identifiers, we could potentially get a perfect parse tree even without any 
soft disambiguation mechanism. I think It might affect our designs in some 
scenarios -- for example, if we have a fresh clang AST, we can first annotate 
all identifier tokens (what does this identifier refer to, a class, a enum 
etc), and use this approach to build a forest in the pseudoparser (we might not 
need a real disambiguation, because the forest likely ends up with a single 
perfect tree:)).

Had some offline discussion with @ilya-biryukov today:

If we look at all ambiguities in `ASTSignals.cpp`, 

  45 Ambiguous nodes:
     18 type-name
     14 simple-type-specifier
      5 postfix-expression
      3 namespace-name
      3 nested-name-specifier
      1 relational-expression
      1 template-argument

Most of ambiguities (>90%) are just "local", they won't affect the structure of 
the tree, and they seem to be less useful. If we think about the final output 
clang syntax-tree, we care about tree structures, and these ambiguities 
basically provide zero value. 
For example,  the nested-name-specifier `trace::span` case, in the clang 
syntax-tree we model the `trace` specifier as a general identifier name 
specifier regardless whether the `trace` is a type-name or namespace-name;
the simple-declaration `ASTSignals Signals;` case, it is sufficient to know it 
is a simple-declaration, and `ASTSignals` is a simple-type-specifier, but 
whether the simple-type-specifier is type-name (thus class-name, enum-name, 
typedef-name) or template-name is less interesting, and we probably don't want 
to distinguish them in the clang syntax-tree;
similar to the postfix-expression `Foo(...);` case, we might use the same node 
in the clang syntax-tree to model a function-call and an explicit class type 
conversion.

So one option will be to eliminate these "local" ambiguities in the forest (by 
replacing the type-name, class-name, enum-name, typedef-name, template-name 
with a generic `name`), as it won't affect tree-structure. Re the 
implementation, we can do a post-process on the forest -- replace an ambiguous 
forest node if all its alternatives share the same tree structure, ad-hoc 
targeting on type-name, simple-type-specifier, postfix-expression nonterminals 
is probably enough. An alternative is to adjust the cxx grammar rules (not sure 
how intrusive the change is);

The only "real" ambiguity is the 
`dyn_cast<NamespaceDecl>(ND->getDeclContext())` (whether it is a postfix 
expression, or a pair of comparison expressions). This is a real ambiguity in 
C++ that requires type information to resolve. For these ambiguities, we can't 
eliminate them, and we do need a ranking-based disambiguation.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130626/new/

https://reviews.llvm.org/D130626

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D130626: [pseudo] experiment

Reply via email to