[PATCH] D118196: [syntax][pseudo] Implement LR parsing table.

Haojian Wu via Phabricator via cfe-commits Mon, 07 Feb 2022 11:40:04 -0800

hokein marked 11 inline comments as done.
hokein added a comment.

comments around LRAutomaton should be addressed in the separate patch, 
https://reviews.llvm.org/D119172.

================
Comment at: clang/include/clang/Tooling/Syntax/Pseudo/LRAutomaton.h:32
+public:
+  Item(RuleID ID, uint8_t RuleLength, uint8_t DotPos)
+      : RID(ID), RuleLength(RuleLength), DotPos(DotPos) {}
----------------
sammccall wrote:
> accepting the grammar as a parameter rather than the rule length would let us 
> ensure the length is correct locally rather than relying on the caller.
> Essentially all the callers need to do a lookup anyway.
right, but we still need this one, for constructing an empty/Tombstone items 
for DenseMapInfo.

================
Comment at: clang/include/clang/Tooling/Syntax/Pseudo/LRAutomaton.h:66
+
+using ItemSet = std::set<Item>;
+// A state of the LR automaton is an item set, which contains all possible 
rules
----------------
sammccall wrote:
> why is std::set used here? (generally a data structure to avoid)
the main reason to we want a deterministic order for computing the hash value. 
Changed to a sorted vector.

================
Comment at: clang/include/clang/Tooling/Syntax/Pseudo/LRAutomaton.h:79
+
+// A deterministic finite state automaton for LR parsing.
+//
----------------
sammccall wrote:
> Hmm, isn't the point of GLR that it's a nondeterministic automaton?
> 
> We've done the LR-style NFA->DFA conversion, but it's not complete (when we 
> hit a conflict we continue rather than giving up)
yes and no. There are a few conceptually difference here.

GLR is indeed a nondeterministic parser, but I think the term 
deterministic/nondeterministic in parser is different than the ones used in 
automaton context. 
A deterministic parser (typical LR) has a property that there will always be 
only one possible action to choose from, thus it is fast (linear), and it 
produces only one syntax tree;
A deterministic automaton is described 
[here](https://en.wikipedia.org/wiki/Nondeterministic_finite_automaton).

> We've done the LR-style NFA->DFA conversion, but it's not complete (when we 
> hit a conflict we continue rather than giving up)

Not exactly, there is no NFA->DFA conversion here. What this patch does:
1. we constructs a deterministic LR(0) automaton (DFA) directly, this is what 
LRAutomaton represents;
2. based on the DFA, we construct an LR table (called Action and GoTo Table in 
standard LR). In particular, we use the FollowSet to determine reduce actions, 
it is called SLR(1) table which is powerful the the LR(0) table.

A conflict in the LRTable doesn't imply an incomplete DFA. For an arbitrary 
grammar (no matter it is ambiguous or not), we can always construct a DFA. For 
example, based on the same LR(0) automaton, we could build a SLR(1) table 
(without conflicts) or LR(0) table (with conflicts). 
The real problem lies in the interpretation of states of the automaton, 
considering a node in the automaton with state 0
```
E := T. + E
E := T.
```
and the node as a "+" outer edge, if the next symbol is `+`, we have two 
options "reduce E := T" or shift `+`. SLR can tell reduction is not available 
`+` is not in the FOLLOW(E) (thus no conflict) while LR(0) will accept both 
(thus conflicts). 

I added some bits in the file comment of the LRGraph.h, hope it can clarify 
these concepts.

================
Comment at: clang/lib/Tooling/Syntax/Pseudo/LRBuilder.cpp:109
+      auto Hash = hash_code(S);
+      auto It = StatesIndex.find(Hash);
+      if (It != StatesIndex.end())
----------------
sammccall wrote:
> using hashcode instead of the key itself should be explained.
> (If it's just an issue with hash of std::set, I think using std::vector fixes 
> it)
using the ItemSet as key seems heavy (for cxx.grammar, 65KB for 
`llvm::DenseMap<itemset, ..>` vs 32KB for `llvm::DenseMap<hash_code>`) and 
unnecessary (we don't need to access the Itemset).

Added comments.

================
Comment at: clang/lib/Tooling/Syntax/Pseudo/LRBuilder.cpp:144
+           "augmented symbol rule must be _ := start_symbol.");
+    auto StartState = closure({{{RID, Rule.size(), 0}}}, G);
+    auto Result = Builder.insert(std::move(StartState));
----------------
sammccall wrote:
> nit: too many braces - please spell out some of the types
> (and Auto should probably be State here?)
> too many braces - please spell out some of the types
this is one of cons of using inlay-hints :(

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D118196/new/

https://reviews.llvm.org/D118196

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D118196: [syntax][pseudo] Implement LR parsing table.

Reply via email to