ilya-biryukov added a comment. I've addressed most of the comments, except the naming ones. We need a convention for naming the language nodes and names for composite and leaf structural nodes.
For "language" nodes, I suggest we use `CompoundStatement`, `Recovery`, `TopLevelDeclaration`, `TemplateArgumentList`, `TypeTemplateArgument`, etc. That is, we spell out the words in full, no shorthands like `Stmt` or `Expr`. That would make things a bit more verbose, but hopefully that helps distinguish from clang AST. For structural nodes, see the relevant comment threads. ================ Comment at: clang/include/clang/Tooling/Syntax/Arena.h:1 +//===- Arena.h - memory arena and bookkeeping for syntax trees --- C++ -*-===// +// ---------------- sammccall wrote: > From a user's point of view, this looks a lot like part of the tree structure > in some sense. > > If you expect that users will need to keep the arena rather than the > TokenBuffer around (e.g. so nodes can be allocated), then it might make sense > to declare it at the bottom of `Cascade.h` Now part of `Tree.h` ================ Comment at: clang/include/clang/Tooling/Syntax/Corpus.h:38 + std::pair<FileID, llvm::ArrayRef<syntax::Token>> + tokenizeBuffer(std::unique_ptr<llvm::MemoryBuffer> Buffer); + ---------------- ilya-biryukov wrote: > sammccall wrote: > > ilya-biryukov wrote: > > > sammccall wrote: > > > > Are you planning to have a way to add tokens directly? Having to turn > > > > them into text and re-lex them seems like it might be inconvenient. > > > The tokens have source locations and refer to a text in some buffer. > > > `tokenizeBuffer` makes is easier, not harder, to mock tokens. > > Fair enough. > > > > `lexBuffer` might be a slightly clearer name? > > > > Who are the intended non-test users of this function? Are they better > > served by being able (and responsible) for constructing a MemoryBuffer with > > e.g. a sensible name and ownership, or would it be better to pass a > > StringRef and have the Arena come up with a sensible "anonymous" name? > The only use-case in my prototype so far is creating token nodes for > punctuation nodes, e.g. say you want to create an expr of the form `<a>+<b>`, > where both `a` and `b` are existing expressions and you need to synthesize a > leaf node for `+`. > We use this function to synthesize a buffer with the corresponding token. > > All the use-cases I can imagine are around synthesizing syntax trees (as > opposed to constructing them from the AST). Renamed to `lexBuffer`. This does not have usages (yet), we can also remove it from this patch if needed. ================ Comment at: clang/include/clang/Tooling/Syntax/Corpus.h:40 + + /// Construct a new syntax node of a specified kind. The memory for a node is + /// owned by the corpus and will be freed when the corpus is destroyed. ---------------- sammccall wrote: > ilya-biryukov wrote: > > sammccall wrote: > > > Now there's two ways to do this: `new (C.allocator()) T(...)` or > > > `C.construct<T>(...)`. Do we need both? > > > > > > (If we do, is the syntax `new (C) T(...)` more natural?) > > I think `C.construct<T>()` read better than `new (C) T(...)`. Not a big fan > > of placement new exprs. > They are fairly consistently used in llvm/clang for this sort of thing, > though. > > I find it valuable because arena allocations look otherwise a lot like buggy > ownership patterns. The dedicated syntax calls out the unusual case: we're > creating a new thing, and someone else owns it, but won't do anything with it. Removed in favour of placement new. I guess that also makes it a bit more natural to have separate storage for other nodes that have different lifetime (e.g. use a separate arena). ================ Comment at: clang/include/clang/Tooling/Syntax/Tree.h:11 +// +// The code is divided in the following way: +// - 'Tree/Cascade.h' defines the basic structure of the syntax tree, ---------------- sammccall wrote: > As discussed offline: > - I don't think (at this point) we need an umbrella header. Generic tree > structure, specific node semantics, and operations like "build a tree" are > distinct enough from a user POV that asking them to include headers is fine. > - We may want an umbrella for the node types, if we end up splitting that, > but no need yet. > - Splitting generic tree stuff vs specific node stuff sounds good, but I > think having them be sibling headers in `Tooling/Syntax` is enough - not sure > about the `Tree/` subdirectory. > > So I'd suggest something like: > - `Tree/Cascade.h` + `Arena.h` --> `Tree.h` > - `Tree.h` -> `BuildTree.h` > - `Tree/Nodes.h` + `NodeKind.h` --> `Nodes.h` > (have comments on some of these throughout) Changed to the proposed header structure, it looks good. Had to move `Leaf::classof` and `TreeNode::classof` into a `.cpp`, they need a definition of `NodeKind`. Keeping those in a header file was a micro-optimization anyway, that's probably not too important. ================ Comment at: clang/include/clang/Tooling/Syntax/Tree/Cascade.h:74 +/// A composite tree node that has children. +class TreeNode : public Node { +public: ---------------- sammccall wrote: > ilya-biryukov wrote: > > sammccall wrote: > > > This use of "tree node" to mean specifically internal node seems > > > confusing - is it common? > > I don't think it's common, can use `CompositeNode` - seems like a better > > alternative > What about Subtree? `Subtree` seems ok, although `Composite` conveys the meaning better to my taste. `Composite` does not seem to work without the `node` suffix, though, and we probably don't want the suffix in other nodes, so I'm torn on this. ================ Comment at: clang/include/clang/Tooling/Syntax/Tree/Cascade.h:59 +/// stream. +class Leaf final : public Node { +public: ---------------- sammccall wrote: > `syntax::Leaf` vs `syntax::Struct` seems a little odd - it talks about the > tree structure rather than the contents. (Unlike TreeNode/CompositeNode this > is likely to be used for its specific semantics). > > Maybe `syntax::Tokens` (though the s is subtle). `syntax::Text`? `syntax::Tokens` actually looks good, but we should rename the `tokens()` accessor somehow in that case. I have only super-generic variants in my head: `elements()`, `items()`. Any better ideas? ================ Comment at: clang/include/clang/Tooling/Syntax/Tree/Cascade.h:64 + + static bool classof(const Node *N) { return N->kind() == NodeKind::Leaf; } + ---------------- sammccall wrote: > you're going to want classof for each node type, a private copy constructor > (for cloning), a friend statement to whatever does the cloning (or the > clone() function itself, if it goes on the class...) > > You may want to put this behind a DEFINE_NODE_BOILERPLATE(Leaf) macro :-( I'd avoid using macros. As long as the amount of boilerplate is small, it's not too annoying to write. And it is small for now, we only have a single constructor and a `classof` per node, so keeping it explicit in this patch seems ok. We can revisit if more stuff pops up, but I think we do it without extra boilerplate per node for clone, and hopefully for other things too. ================ Comment at: clang/include/clang/Tooling/Syntax/Tree/NodeKind.h:1 +//===- NodeKind.h - an enum listing nodes of a syntax tree ----*- C++ -*-=====// +// ---------------- sammccall wrote: > Why is this separate from Nodes.h? Not anymore. The original reason was that `Tree.h` need `NodeKind` for implementing casts and the implementation was in a header. ================ Comment at: clang/include/clang/Tooling/Syntax/Tree/NodeKind.h:22 +/// For debugging purposes. +llvm::StringRef toString(NodeKind K); + ---------------- sammccall wrote: > if you make this `operator<<`, then it's slightly more flexible I think > (llvm::to_string also works). > It's not as fast, but I don't think that matters here? For some reason I thought `gdb` does not show the enum value names, so I added a named method to simplify debugging. Turns out it does show the enum names, not just int values, so I'm perfectly happy to the stream output operator. ================ Comment at: clang/include/clang/Tooling/Syntax/Tree/NodeList.h:1 +//===- NodeList.h ---------------------------------------------*- C++ -*-=====// +// ---------------- sammccall wrote: > Implementing custom containers is a bit sad. > > Alternatives we discussed: > - adapt bumpptrallocactor to std allocator and use std::vector: wastes a > pointer from vector->allocator > - use ArrayRef<Node*> or ArrayRef<Node> and require the whole list to be > reallocated when children change (but *replacing* children is fine) > - use a linked-list representation instead: Node* Node::NextSibling, > CompositeNode::FirstChild. This fits the allocation strategy more nicely. You > probably need only single links, and can add an iterator if needed. No more custom containers, explicit tree structure seems to be better. ================ Comment at: clang/lib/Tooling/Syntax/BuildFromAST.cpp:1 +//===- BuildFromAST.cpp ---------------------------------------*- C++ -*-=====// +// ---------------- sammccall wrote: > I haven't reviewed this file yet :-) Please do! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D61637/new/ https://reviews.llvm.org/D61637 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits