hokein added inline comments.
================ Comment at: clang/include/clang/Tooling/Syntax/Pseudo/LRTable.h:84 + + Kind K = Kind::Error; + // Value ---------------- sammccall wrote: > This action struct can be squeezed to 16 bits. > > Kind only needs 3 bits (I suspect it can be 2: Error and Accept aren't > heavily used). > RuleID is 12 bits, StateID would fit within 11 and 12 should be safe. > > Combined with the optimization suggested below, this should get the whole > table down to 335KB. squeezed to 16 bits (3 bits for Kind, and 13 bits for the Value). ================ Comment at: clang/include/clang/Tooling/Syntax/Pseudo/LRTable.h:105 + auto Goto = find({From, NonTerminal}); + assert(Goto.size() == 1 && Goto.front().isGoTo()); + return Goto.front().goTo(); ---------------- sammccall wrote: > Why is this assertion valid? Is it a precondition of the function that some > item in From can advance over NonTerminal? This is guaranteed by a DFA (a node can not have two out edges with a same label), the same reason why we don't have shift/shift conflicts. ================ Comment at: clang/include/clang/Tooling/Syntax/Pseudo/LRTable.h:121 +private: + struct IndexKey { + StateID From; ---------------- sammccall wrote: > I think there's an even more compact representation: > Basically sort the index keys as symbol-major, but then don't store the > actual symbol there but rather store the range for the symbol in an external > array: > > ``` > // index is lookahead tok::TokenKind. Values are indices into Index: the > range for tok is StateIdx[TokenIdx[tok]...TokenIdx[tok+1]] > vector<unsigned> TokenIdx; > > // index is nonterminal SymbolID. Values are indices into Index: the range > for sym is StateIdx[NontermIdx[sym]...NontermIdx[sym+1]]. > vector<unsigned> NontermIdx; > > // parallel to actions, value is the from state. Only subranges are sorted. > vector<StateID> StateIdx; > > vector<Action> Actions; > ``` > > This should be 4 * (392 + 245) + (2 + 4) * 83k ~= 500k. And the one extra > indirection should save 5-10 steps of binary search. > > This is equivalent to 2 * `vector<unsigned>` + `vector<pair<StateID, > Action>>` but keeping the searched index compact is probably a good thing. This looks like a nice improvement, though the implementation is a little tricky -- we reduced the binary-search space from 2 dimension (State, Symbol) to 1 dimension Symbol. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D118196/new/ https://reviews.llvm.org/D118196 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits