[PATCH] D130337: [pseudo] Eliminate multiple-specified-types ambiguities using guards

Haojian Wu via Phabricator via cfe-commits Fri, 22 Jul 2022 05:45:15 -0700

hokein added a comment.

Thanks, the change looks good in general.


> we handle *all* rules for the interesting node types explicitly, rather than 
> default: return false. This allows us to assert that all cases are handled, 
> so things don't "fall through the cracks" after grammar changes. Alternative 
> suggestions welcome. (I have a feeling this "exhaustive pattern-matching" 
> idea will come up again...)

The dangerous bit is that now we will crash at the runtime if the parsing code 
triggers a missing case.

Yeah, I hit this problem in the previous function-declarator patch. One 
approach will be to define an enum for each nonterminal `enum Nonterminal { 
rhs0_rhs1 }`, rather than put all rules into a single enum. It is easier to 
enumerate all values for a single nonterminal (I think this is the common case)

  namespace cxx {
  namespace rule {
  
  enum simple_type_specifier {
    builtin_type0,
    nested_name_specifier0_type_name1,
    ...
  }
  
  }
  }



In D130337#3671159 <https://reviews.llvm.org/D130337#3671159>, @sammccall wrote:

> FWIW, as-is with no caching, this is a ~2% slowdown on my machine (5.82 -> 
> 5.72 MB/s on SemaCodeComplete.cpp).

Actually, the number looks quite good to me (I expected that there will be a 
significant slowdown without caching). I think we can check in the current 
version, and add the caching stuff afterwards.

> Whereas D130150 <https://reviews.llvm.org/D130150> using the grammar is a a 
> 7% speedup (5.82 -> 6.22), so roughly an 9% performance difference between 
> the approaches.

Yeah, because the grammar-based approach reduces the number of ambiguous node 
in the forest.



================
Comment at: clang-tools-extra/pseudo/lib/cxx/CXX.cpp:284
+           
defining_type_specifier_seq_0defining_type_specifier_1defining_type_specifier_seq,
+       GUARD(!hasExclusiveType(P.RHS.front()) ||
+             !hasExclusiveType(P.RHS.back()))},
----------------
nit: I would suggest using the index explicitly `P.RHS[0]`, `P.RHS[1]`, it 
increases the readability (the rul name encoding the index, easier to spot the 
corresponding element).


================
Comment at: clang-tools-extra/pseudo/lib/cxx/cxx.bnf:353
 decl-specifier-seq := decl-specifier
-decl-specifier-seq := decl-specifier decl-specifier-seq
+decl-specifier-seq := decl-specifier decl-specifier-seq [guard]
 storage-class-specifier := STATIC
----------------
offtopic comment: The sad bit of the `RuleID` approach (vs `guard=SingleType`) 
is that we don't really know what kind of the guard is by reading the grammar 
file only.

I think this is critical information, and worth to keep in the grammar file. 
(ideas: add comments, or bring the `guard=SingleType` in the grammar again, but 
we ignore the `guard` value in the grammar parsing).


================
Comment at: clang-tools-extra/pseudo/lib/cxx/cxx.bnf:385
+builtin-type := BOOL
 simple-type-specifier := SHORT
+builtin-type := INT
----------------
I think the reason to leave SHORT/LONG/SIGNED UNSIGNED as-is is that they can 
combined with other type (e.g. short int).

Can we group them together, and add a comment?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130337/new/

https://reviews.llvm.org/D130337

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D130337: [pseudo] Eliminate multiple-specified-types ambiguities using guards

Reply via email to