hokein accepted this revision. hokein added a comment. This revision is now accepted and ready to land.
This looks a great improvement. A few nits, it looks good to me overall. ================ Comment at: clang-tools-extra/pseudo/include/clang-pseudo/grammar/LRTable.h:136 + // check whether a reduction should apply in the current context. + llvm::ArrayRef<RuleID> getReduceRules(StateID State) const { + return llvm::makeArrayRef(&Reduces[ReduceOffset[State]], ---------------- I think the API is motivated by the LR(0) experimentation, we're now decoupling the reduction look up from the lookahead token, this seems nice (LR(0) only needs `getReduceRules`, other LR(1) variant need `getReduceRules` + `canFollow`). The API usage for SLR(1) is not quite straight forward at the first glance, would be better to add a small snippet in the comment. ================ Comment at: clang-tools-extra/pseudo/include/clang-pseudo/grammar/LRTable.h:143 + assert(isToken(Terminal)); + assert(!isToken(Nonterminal)); + return FollowSets.test(tok::NUM_TOKENS * Nonterminal + ---------------- nit: use `isNonterminal(Nonterminal)`? IMO it is clearer. ================ Comment at: clang-tools-extra/pseudo/include/clang-pseudo/grammar/LRTable.h:209 + + // Reduces from state S: Reduces[ [ReduceOffset[S], ReduceOffset[S+1]) ] + std::vector<uint32_t> ReduceOffset; ---------------- nit: I think rephasing it as: `a half-open range of Reduces [ReduceOffset[S], ReduceOffset[S+1]).` is clearer -- the current syntax `Reduces[ [ReduceOffset[S], ReduceOffset[S+1]) ]` is a bit weird (My first attempt was to read it as an unmatch-brackets code snippet) ================ Comment at: clang-tools-extra/pseudo/include/clang-pseudo/grammar/LRTable.h:212 + std::vector<RuleID> Reduces; + // T in Follow(NT) == FollowSets[NT * NUM_TOKENS + tok::Kind(T)] + llvm::BitVector FollowSets; ---------------- The NT is not quite obvious, what is NT? I think it is the symbol ID, then I don't understand the `NT * NUM_TOKENS + tok::Kind(T)` part. ================ Comment at: clang-tools-extra/pseudo/include/clang-pseudo/grammar/LRTable.h:213 + // T in Follow(NT) == FollowSets[NT * NUM_TOKENS + tok::Kind(T)] + llvm::BitVector FollowSets; }; ---------------- Oh, I understand it now after reading the build code... NT here is the non-terminal (symbol to be reduced), T is the lookahead. And FollowSets is conceptually a flat array of a 2d bool array [Nonterminal, Tokens]. Probably add some more comments. ================ Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:369 // (Roughly this means there's no local ambiguity, so the LR algorithm works). bool popAndPushTrivial() { if (!Sequences.empty() || Heads->size() != NextPopHead + 1) ---------------- nit: add a comment about what does return value indicate. ================ Comment at: clang-tools-extra/pseudo/lib/grammar/LRTableBuild.cpp:116 + llvm::DenseMap<StateID, llvm::SmallSet<RuleID, 4>> Reduces; + std::vector<llvm::DenseSet<SymbolID>> FollowSets; + ---------------- make these members as private -- having two public members seems to violate the pattern. (FollowSets can be initialized in the constructor, we might need a separate method for `Reduces`. or either change the Builder to a struct). ================ Comment at: clang-tools-extra/pseudo/unittests/LRTableTest.cpp:28 + auto G = Grammar::parseBNF(R"bnf( + _ := expr + expr := term ---------------- nit: we use the hard-coded ruleID in the following test, I'd suggest add a trailing comment for the rule ID for each rule here, for better code readability. ================ Comment at: clang-tools-extra/pseudo/unittests/LRTableTest.cpp:38 + SymbolID Eof = tokenSymbol(tok::eof); + SymbolID Semi = tokenSymbol(tok::semi); + SymbolID Int = tokenSymbol(tok::kw_int); ---------------- It would be better to change it to an `IDENTIFIER` to reflect the grammar above. ================ Comment at: clang-tools-extra/pseudo/unittests/LRTableTest.cpp:39 + SymbolID Semi = tokenSymbol(tok::semi); + SymbolID Int = tokenSymbol(tok::kw_int); + SymbolID Plus = tokenSymbol(tok::plus); ---------------- nit: I'd move this on Line67, as this token serves as an invalid token according to the grammar. ================ Comment at: clang-tools-extra/pseudo/unittests/LRTableTest.cpp:42 + + // eof semi term reduce + // +-------+----+-------+------+------- ---------------- the row is usual for terminals, and the reduce is a separate table, which seems a bit confusing -- I would probably split the reduce table out (adding an empty col between term and reduce) ================ Comment at: clang-tools-extra/pseudo/unittests/LRTableTest.cpp:45 + // |state0 | | s0 | | r0 + // |state1 | | | g3 | acc + // |state2 | | | | r1 ---------------- nit: `acc` => `R2 (acc)` (sorry, I didn't update the comment in the removing-accept patch) ================ Comment at: clang-tools-extra/pseudo/unittests/LRTableTest.cpp:53 + std::vector<LRTable::ReduceEntry> ReduceEntries = { + {/* State=*/0, 0}, + {/* State=*/1, 2}, ---------------- nit: add a `/*RuleID=*/` comment annotation for the second argument. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D128472/new/ https://reviews.llvm.org/D128472 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits