sammccall added a comment. OK, a bunch of random notes but I'm getting a higher level picture.
I'm not sure about the split between LRAutomaton and LRTable. I suppose doing it in two stages simplifies the implementation, but I don't think the first stage particularly needs to be public. Also I think the names are confusing: Automaton vs Table isn't a clear contrast as one describes behavior and one a data structure. And the "LRTable" seems more automaton-like to me! I think my suggestion would be: - call the final structure LRAutomaton - call the LRAutomaton::build()::Builder structure `LRGraph` or something - I'm not sure the LRAutomaton structure needs to exist exactly, seems like the current LRTable::build() could operate on `LRGraph` directly but I'm going to ponder this more. ================ Comment at: clang/include/clang/Tooling/Syntax/Pseudo/LRAutomaton.h:32 +public: + Item(RuleID ID, uint8_t RuleLength, uint8_t DotPos) + : RID(ID), RuleLength(RuleLength), DotPos(DotPos) {} ---------------- accepting the grammar as a parameter rather than the rule length would let us ensure the length is correct locally rather than relying on the caller. Essentially all the callers need to do a lookup anyway. ================ Comment at: clang/include/clang/Tooling/Syntax/Pseudo/LRAutomaton.h:45 + } + const pseudo::Rule &rule(const Grammar &G) const { return G.lookupRule(RID); } + RuleID ruleID() const { return RID; } ---------------- nit: redundant pseudo::qualifier ================ Comment at: clang/include/clang/Tooling/Syntax/Pseudo/LRAutomaton.h:47 + RuleID ruleID() const { return RID; } + int dotPos() const { return DotPos; } + std::string dump(const Grammar &G) const; ---------------- nit: unsigned. just dot()? and similarly ruleID()->rule()? Not sure these would really be unclear ================ Comment at: clang/include/clang/Tooling/Syntax/Pseudo/LRAutomaton.h:79 + +// A deterministic finite state automaton for LR parsing. +// ---------------- Hmm, isn't the point of GLR that it's a nondeterministic automaton? We've done the LR-style NFA->DFA conversion, but it's not complete (when we hit a conflict we continue rather than giving up) ================ Comment at: clang/lib/Tooling/Syntax/Pseudo/LRBuilder.cpp:144 + "augmented symbol rule must be _ := start_symbol."); + auto StartState = closure({{{RID, Rule.size(), 0}}}, G); + auto Result = Builder.insert(std::move(StartState)); ---------------- nit: too many braces - please spell out some of the types (and Auto should probably be State here?) ================ Comment at: clang/lib/Tooling/Syntax/Pseudo/LRBuilder.cpp:215 + for (const Item &I : Automaton.states()[SID]) { + // If "_ := start_symbol ." in the State, set Action[State, eof] to + // accept. ---------------- These comments are echoing the code - saying what, not why. "If we've just parsed the start symbol, we can accept the input". ================ Comment at: clang/lib/Tooling/Syntax/Pseudo/LRBuilder.cpp:220 + Builder.addAction(SID, tokenSymbol(tok::eof), + LRTable::Action{LRTable::Action::Kind::Accept, {}}); + continue; ---------------- Seems like we should maybe have the accepted symbol as a payload on the Accept action? Actually why do we need an Accept action, as opposed to just a reduce action and recognize it because it's the symbol we want and the stack is empty? ================ Comment at: clang/lib/Tooling/Syntax/Pseudo/LRBuilder.cpp:224 + if (!I.hasNext()) { + // If a state i has an item "A := x.", set Action[i, a] to reduce + // "A := x" for all a in FOLLOW(A). ---------------- "If we've reached the end of a rule A := ..., then we can reduce if the next token is in the follow set of A" Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D118196/new/ https://reviews.llvm.org/D118196 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits