[PATCH] D61637: [Syntax] Introduce syntax trees

Ilya Biryukov via Phabricator via cfe-commits Fri, 21 Jun 2019 05:09:10 -0700

ilya-biryukov marked 13 inline comments as done.
ilya-biryukov added a comment.


This is ready for another round now.



================
Comment at: clang/lib/Tooling/Syntax/BuildTree.cpp:37
+  /// and \p Last are added as children of the new node.
+  void learnNode(SourceLocation Fist, SourceLocation Last,
+                 syntax::TreeNode *New);
----------------
sammccall wrote:
> sammccall wrote:
> > maybe formNode, formToken, formRoot()?
> if syntax nodes strictly nest and we form left-to-right and bottom up, then 
> why are there ever pending nodes that aren't in the range? Is it because we 
> don't aggregate them as early as possible?
> 
> (edit: after offline discussion, there are precise invariants here that could 
> be documented and asserted)
Sorry, missed this comment and went with `expectToken()` and `expectNode()`, 
root is now built on `consume()`. Can change to `form*`, not a big deal.

I still need to write good docs about the invariants here, so leaving this open.



================
Comment at: clang/lib/Tooling/Syntax/BuildTree.cpp:48
+  /// Consume the root node.
+  syntax::TranslationUnit *root() && {
+    assert(Root);
----------------
ilya-biryukov wrote:
> sammccall wrote:
> > any particular reason learnRoot() and root() are different functions?
> > 
> > If learnRoot() returned TranslationUnit*, then we avoid the need for the 
> > caller to know about the dependency, it would track the state itself.
> `learnRoot` is called inside ast visitor when processing 
> `TranslationUnitDecl` and `root()` is used to consume the result.
> 
> I guess we could just delay `learnRoot` until `consume()` is called, 
> shouldn't be a big deal. I'll do this in the next iteration.
`consume()` now builds the root node.


================
Comment at: clang/lib/Tooling/Syntax/BuildTree.cpp:92
+  bool TraverseCompoundStmt(CompoundStmt *S) {
+    Builder.learnTokenNode(S->getLBracLoc(), tok::l_brace);
+    Builder.learnTokenNode(S->getRBracLoc(), tok::r_brace);
----------------
sammccall wrote:
> So explicitly claiming the primitive tokens, but implicitly claiming the 
> subtrees, seems like a weird mix.
> Having both explicit might be nicer:
>  - It seems somewhat likely we want to record/tag their semantics (maybe in 
> the child itself, or even the low bits of its pointer?), rather than having 
> accessors scan around looking for something likely.
>  - currently when expected subtrees fail to parse, their tokens get 
> (implicitly) wrapped up in Recovery nodes. They're good targets for heuristic 
> parsing, but this probably means we should record what an unexplained range 
> of tokens is supposed to be for.
> 
> Thinking of something like:
> ```
> builder.expectToken(l_brace, S->getLBracLoc());
> builder.expectTree(Statement, S->getBody());
> builder.expectToken(r_brace, S->getRBracLoc());
> ```
> where the builder would react to non-null AST nodes by mapping the associated 
> syntax node, and null AST nodes by trying to heuristically parse the tokens 
> in between LBracLoc and RBracLoc.
> 
> But lots of unanswered questions here: body is a list of statements, how does 
> that work? What if LBracLoc or RBracLoc is missing? etc.
That's exactly the design we have one, with a limitation that `expect*` have to 
be called in left-to-right and bottom-up manner. Also, you can only build a 
tree that ends at the tokens that were consumed.

This is actually a reasonable interface to build a tree from an actual parser, 
but might feel weird for a ast-to-syntax transformation. I need to figure out a 
way to write good docs about it, but there's a separate comment for that. 
Marking this as done (although the questions you mentioned at the end are still 
there)


================
Comment at: clang/lib/Tooling/Syntax/BuildTree.cpp:135
+
+void syntax::TreeBuilder::learnNodeImpl(const syntax::Token *Begin,
+                                        const syntax::Token *End,
----------------
sammccall wrote:
> this function needs some high-level implementation comments
Done. Also renamed it to `consumeNode`. 

The docs are very short, though, might need a revamp for clarity.


================
Comment at: clang/lib/Tooling/Syntax/BuildTree.cpp:164
+  };
+  for (auto It = FirstChild; It != NodesInProgress.end(); ++It) {
+    // Add non-coverred ranges as recovery nodes.
----------------
sammccall wrote:
> why can It not point to a node that spans/is past End?
> 
> (edit after offline discussion: it's an important invariant that we're always 
> consuming a suffix of the pending nodes)
This is now spelled out in the documentation for `foldChildren`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61637/new/

https://reviews.llvm.org/D61637



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D61637: [Syntax] Introduce syntax trees

Reply via email to