ilya-biryukov marked 13 inline comments as done. ilya-biryukov added a comment.
This is ready for another round now. ================ Comment at: clang/lib/Tooling/Syntax/BuildTree.cpp:37 + /// and \p Last are added as children of the new node. + void learnNode(SourceLocation Fist, SourceLocation Last, + syntax::TreeNode *New); ---------------- sammccall wrote: > sammccall wrote: > > maybe formNode, formToken, formRoot()? > if syntax nodes strictly nest and we form left-to-right and bottom up, then > why are there ever pending nodes that aren't in the range? Is it because we > don't aggregate them as early as possible? > > (edit: after offline discussion, there are precise invariants here that could > be documented and asserted) Sorry, missed this comment and went with `expectToken()` and `expectNode()`, root is now built on `consume()`. Can change to `form*`, not a big deal. I still need to write good docs about the invariants here, so leaving this open. ================ Comment at: clang/lib/Tooling/Syntax/BuildTree.cpp:48 + /// Consume the root node. + syntax::TranslationUnit *root() && { + assert(Root); ---------------- ilya-biryukov wrote: > sammccall wrote: > > any particular reason learnRoot() and root() are different functions? > > > > If learnRoot() returned TranslationUnit*, then we avoid the need for the > > caller to know about the dependency, it would track the state itself. > `learnRoot` is called inside ast visitor when processing > `TranslationUnitDecl` and `root()` is used to consume the result. > > I guess we could just delay `learnRoot` until `consume()` is called, > shouldn't be a big deal. I'll do this in the next iteration. `consume()` now builds the root node. ================ Comment at: clang/lib/Tooling/Syntax/BuildTree.cpp:92 + bool TraverseCompoundStmt(CompoundStmt *S) { + Builder.learnTokenNode(S->getLBracLoc(), tok::l_brace); + Builder.learnTokenNode(S->getRBracLoc(), tok::r_brace); ---------------- sammccall wrote: > So explicitly claiming the primitive tokens, but implicitly claiming the > subtrees, seems like a weird mix. > Having both explicit might be nicer: > - It seems somewhat likely we want to record/tag their semantics (maybe in > the child itself, or even the low bits of its pointer?), rather than having > accessors scan around looking for something likely. > - currently when expected subtrees fail to parse, their tokens get > (implicitly) wrapped up in Recovery nodes. They're good targets for heuristic > parsing, but this probably means we should record what an unexplained range > of tokens is supposed to be for. > > Thinking of something like: > ``` > builder.expectToken(l_brace, S->getLBracLoc()); > builder.expectTree(Statement, S->getBody()); > builder.expectToken(r_brace, S->getRBracLoc()); > ``` > where the builder would react to non-null AST nodes by mapping the associated > syntax node, and null AST nodes by trying to heuristically parse the tokens > in between LBracLoc and RBracLoc. > > But lots of unanswered questions here: body is a list of statements, how does > that work? What if LBracLoc or RBracLoc is missing? etc. That's exactly the design we have one, with a limitation that `expect*` have to be called in left-to-right and bottom-up manner. Also, you can only build a tree that ends at the tokens that were consumed. This is actually a reasonable interface to build a tree from an actual parser, but might feel weird for a ast-to-syntax transformation. I need to figure out a way to write good docs about it, but there's a separate comment for that. Marking this as done (although the questions you mentioned at the end are still there) ================ Comment at: clang/lib/Tooling/Syntax/BuildTree.cpp:135 + +void syntax::TreeBuilder::learnNodeImpl(const syntax::Token *Begin, + const syntax::Token *End, ---------------- sammccall wrote: > this function needs some high-level implementation comments Done. Also renamed it to `consumeNode`. The docs are very short, though, might need a revamp for clarity. ================ Comment at: clang/lib/Tooling/Syntax/BuildTree.cpp:164 + }; + for (auto It = FirstChild; It != NodesInProgress.end(); ++It) { + // Add non-coverred ranges as recovery nodes. ---------------- sammccall wrote: > why can It not point to a node that spans/is past End? > > (edit after offline discussion: it's an important invariant that we're always > consuming a suffix of the pending nodes) This is now spelled out in the documentation for `foldChildren`. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D61637/new/ https://reviews.llvm.org/D61637 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits