ken-matsui added inline comments.

================
Comment at: clang/test/Preprocessor/suggest-typoed-directive.c:10
+// expected-warning@+11 {{'#elfidef' directive not found, did you mean 
'#elifdef'?}}
+// expected-warning@+11 {{'#elfindef' directive not found, did you mean 
'#elifdef'?}}
+// expected-warning@+11 {{'#elsi' directive not found, did you mean '#else'?}}
----------------
aaron.ballman wrote:
> ken-matsui wrote:
> > aaron.ballman wrote:
> > > erichkeane wrote:
> > > > aaron.ballman wrote:
> > > > > ken-matsui wrote:
> > > > > > aaron.ballman wrote:
> > > > > > > ken-matsui wrote:
> > > > > > > > aaron.ballman wrote:
> > > > > > > > > ken-matsui wrote:
> > > > > > > > > > aaron.ballman wrote:
> > > > > > > > > > > It's interesting that this one suggested `#elifdef` 
> > > > > > > > > > > instead of `#elifndef` -- is there anything that can be 
> > > > > > > > > > > done for that?
> > > > > > > > > > > 
> > > > > > > > > > > Also, one somewhat interesting question is whether we 
> > > > > > > > > > > want to recommend `#elifdef` and `#elifndef` outside of 
> > > > > > > > > > > C2x/C++2b mode. Those directives only exist in the latest 
> > > > > > > > > > > language standard, but Clang supports them as a 
> > > > > > > > > > > conforming extension in all language modes. Given that 
> > > > > > > > > > > this diagnostic is about typos, I think I'm okay 
> > > > > > > > > > > suggesting the directives even in older language modes. 
> > > > > > > > > > > That's as likely to be a correct suggestion as not, IMO.
> > > > > > > > > > > It's interesting that this one suggested `#elifdef` 
> > > > > > > > > > > instead of `#elifndef` -- is there anything that can be 
> > > > > > > > > > > done for that?
> > > > > > > > > > 
> > > > > > > > > > I found I have to use `std::min_element` instead of 
> > > > > > > > > > `std::max_element` because we are finding the nearest (most 
> > > > > > > > > > minimum distance) string. Meanwhile, `#elfindef` has 2 
> > > > > > > > > > distance with both `#elifdef` and `#elifndef`. After 
> > > > > > > > > > replacing `std::max_element` with `std::min_element`, I 
> > > > > > > > > > could suggest `#elifndef` from `#elfinndef`.
> > > > > > > > > > 
> > > > > > > > > > > Also, one somewhat interesting question is whether we 
> > > > > > > > > > > want to recommend `#elifdef` and `#elifndef` outside of 
> > > > > > > > > > > C2x/C++2b mode. Those directives only exist in the latest 
> > > > > > > > > > > language standard, but Clang supports them as a 
> > > > > > > > > > > conforming extension in all language modes. Given that 
> > > > > > > > > > > this diagnostic is about typos, I think I'm okay 
> > > > > > > > > > > suggesting the directives even in older language modes. 
> > > > > > > > > > > That's as likely to be a correct suggestion as not, IMO.
> > > > > > > > > > 
> > > > > > > > > > I agree with you because Clang implements those directives, 
> > > > > > > > > > and the suggested code will also be compilable. I 
> > > > > > > > > > personally think only not compilable suggestions should be 
> > > > > > > > > > avoided. (Or, we might place additional info for outside of 
> > > > > > > > > > C2x/C++2b mode like `this is a C2x/C++2b feature but 
> > > > > > > > > > compilable on Clang`?)
> > > > > > > > > > 
> > > > > > > > > > ---
> > > > > > > > > > 
> > > > > > > > > > According to the algorithm of `std::min_element`, we only 
> > > > > > > > > > get an iterator of the first element even if there is 
> > > > > > > > > > another element that has the same distance. So, `#elfindef` 
> > > > > > > > > > only suggests `#elifdef` in accordance with the order of 
> > > > > > > > > > `Candidates`, and I don't think it is beautiful to depend 
> > > > > > > > > > on the order of candidates. I would say that we can suggest 
> > > > > > > > > > all the same distance like the following, but I'm not sure 
> > > > > > > > > > this is preferable:
> > > > > > > > > > 
> > > > > > > > > > ```
> > > > > > > > > > #elfindef // diag: unknown directive, did you mean #elifdef 
> > > > > > > > > > or #elifndef?
> > > > > > > > > > ```
> > > > > > > > > > 
> > > > > > > > > > I agree with you because Clang implements those directives, 
> > > > > > > > > > and the suggested code will also be compilable. I 
> > > > > > > > > > personally think only not compilable suggestions should be 
> > > > > > > > > > avoided. (Or, we might place additional info for outside of 
> > > > > > > > > > C2x/C++2b mode like this is a C2x/C++2b feature but 
> > > > > > > > > > compilable on Clang?)
> > > > > > > > > 
> > > > > > > > > I may be changing my mind on this a bit. I now see we don't 
> > > > > > > > > issue an extension warning when using `#elifdef` or 
> > > > > > > > > `#elifndef` in older language modes. That means suggesting 
> > > > > > > > > those will be correct but only for Clang, and the user won't 
> > > > > > > > > have any other diagnostics to tell them about the portability 
> > > > > > > > > issue. And those particular macros are reasonably likely to 
> > > > > > > > > be used in a header where the user is trying to aim for 
> > > > > > > > > portability. So I'm starting to think we should only suggest 
> > > > > > > > > those two in C2x mode (and we should probably add a 
> > > > > > > > > portability warning for #elifdef and #elifndef, so I filed: 
> > > > > > > > > https://github.com/llvm/llvm-project/issues/55306)
> > > > > > > > > 
> > > > > > > > > > I would say that we can suggest all the same distance like 
> > > > > > > > > > the following, but I'm not sure this is preferable:
> > > > > > > > > 
> > > > > > > > > The way we typically handle this is to attach FixIt hints to 
> > > > > > > > > a note instead of to the diagnostic. This way, automatic 
> > > > > > > > > fixes aren't applied (because there are multiple choices to 
> > > > > > > > > pick from) but the user is still able to apply whichever fix 
> > > > > > > > > they want in an IDE or other tool. It might be worth trying 
> > > > > > > > > that approach (e.g., if there's only one candidate, attach it 
> > > > > > > > > to the warning, and if there are two or more, emit a warning 
> > > > > > > > > without a "did you mean" in it and use a new note for the 
> > > > > > > > > fixit. e.g.,
> > > > > > > > > ```
> > > > > > > > > #elfindef // expected-warning {{invalid preprocessing 
> > > > > > > > > directive}} \
> > > > > > > > >              expected-note {{did you mean '#elifdef'?}} \
> > > > > > > > >              expected-note {{did you mean '#elifndef'?}}
> > > > > > > > > ```
> > > > > > > > > WDYT?
> > > > > > > > > I may be changing my mind on this a bit. I now see we don't 
> > > > > > > > > issue an extension warning when using `#elifdef` or 
> > > > > > > > > `#elifndef` in older language modes. That means suggesting 
> > > > > > > > > those will be correct but only for Clang, and the user won't 
> > > > > > > > > have any other diagnostics to tell them about the portability 
> > > > > > > > > issue. And those particular macros are reasonably likely to 
> > > > > > > > > be used in a header where the user is trying to aim for 
> > > > > > > > > portability. So I'm starting to think we should only suggest 
> > > > > > > > > those two in C2x mode (and we should probably add a 
> > > > > > > > > portability warning for #elifdef and #elifndef, so I filed: 
> > > > > > > > > https://github.com/llvm/llvm-project/issues/55306)
> > > > > > > > >
> > > > > > > > 
> > > > > > > > Certainly, it would be less confusing to the user to avoid 
> > > > > > > > suggestions regarding those two.
> > > > > > > > I'm going to fix my code to avoid suggesting them in not C2x 
> > > > > > > > mode.
> > > > > > > > 
> > > > > > > > Thank you for submitting the issue, I also would like to work 
> > > > > > > > on it.
> > > > > > > > 
> > > > > > > > > The way we typically handle this is to attach FixIt hints to 
> > > > > > > > > a note instead of to the diagnostic. This way, automatic 
> > > > > > > > > fixes aren't applied (because there are multiple choices to 
> > > > > > > > > pick from) but the user is still able to apply whichever fix 
> > > > > > > > > they want in an IDE or other tool. It might be worth trying 
> > > > > > > > > that approach (e.g., if there's only one candidate, attach it 
> > > > > > > > > to the warning, and if there are two or more, emit a warning 
> > > > > > > > > without a "did you mean" in it and use a new note for the 
> > > > > > > > > fixit. e.g.,
> > > > > > > > > ```
> > > > > > > > > #elfindef // expected-warning {{invalid preprocessing 
> > > > > > > > > directive}} \
> > > > > > > > >              expected-note {{did you mean '#elifdef'?}} \
> > > > > > > > >              expected-note {{did you mean '#elifndef'?}}
> > > > > > > > > ```
> > > > > > > > > WDYT?
> > > > > > > > 
> > > > > > > > This is really cool, but don't you care about the redundancy of 
> > > > > > > > `did you mean` in terms of the llvm team culture?
> > > > > > > > If not, I will implement notes like the above.
> > > > > > > > Certainly, it would be less confusing to the user to avoid 
> > > > > > > > suggestions regarding those two. I'm going to fix my code to 
> > > > > > > > avoid suggesting them in not C2x mode.
> > > > > > > 
> > > > > > > +1, thank you!
> > > > > > > 
> > > > > > > > This is really cool, but don't you care about the redundancy of 
> > > > > > > > did you mean in terms of the llvm team culture? If not, I will 
> > > > > > > > implement notes like the above.
> > > > > > > 
> > > > > > > I would care if the list were potentially unbounded (like, say, 
> > > > > > > with identifiers in general), but because we know this list will 
> > > > > > > only have a max of two entries on it in this case, it seems 
> > > > > > > reasonable to me. I double-checked with @erichkeane to see if he 
> > > > > > > thought it would be an issue, and he agreed that it being a fixed 
> > > > > > > list makes it pretty reasonable.
> > > > > > > 
> > > > > > > However, that discussion did raise a question -- why are there 
> > > > > > > two suggestions? elifdef requires a swap + delete and elifndef 
> > > > > > > requires just a swap, so we would have thought that it would have 
> > > > > > > been the only option in the list.
> > > > > > With the implementation of Lev distances used in llvm, I could 
> > > > > > simply suggest `#elifdef` from `#elfidef` and `#elifndef` from 
> > > > > > `#elfindef`.
> > > > > > 
> > > > > > So, in this situation, do you think that we still need to add two 
> > > > > > notes for conflicted distances?
> > > > > No, let's skip the two note behavior. If we find ourselves with 
> > > > > multiple suggestions, we'll just leave off the "did you mean?" part 
> > > > > of the diagnostic entirely.
> > > > Might I suggest we just emit the 'first' suggestion?  This isn't really 
> > > > perfect, but I would think that our users put very little thought into 
> > > > it when we suggest the wrong thing.  
> > > Notionally that makes sense, but the situation I'm worried about is when 
> > > users pass `-fixit` to the compiler -- then it automatically does 
> > > whatever, but we have no idea whether that whatever is reasonable or not 
> > > when we have more than one option.
> > I could suggest `#if` from `#id`, but I had to change `#elfindef` to 
> > `#elfinndef` to suggest `#elifndef`.
> > After setting the `AllowReplacements` option to `true`, `#elfindef` 
> > suggested `#elifdef` (I think it might have the same distance with 
> > `#elifndef`).
> Thanks! I'm coming around more to the idea that we'll never get the "perfect" 
> set of suggestions, and so whatever suggestions we get are quite likely good 
> enough. However, I think it would be helpful to also add the tests where the 
> behavior seems a bit strange, like `#elfindef`, just to show that we know the 
> behavior is what it is and it doesn't do bad things like crash or whatnot.
Thank you :) Ok, I'll add the test as well.


================
Comment at: llvm/lib/Support/StringRef.cpp:102
+// Find a similar string in `Candidates`.
+Optional<std::string> StringRef::find_similar_str(const std::vector<StringRef> 
&Candidates, size_t Dist) const {
+  // We need to check if `rng` has the exact case-insensitive string because 
the Levenshtein distance match does not
----------------
aaron.ballman wrote:
> I am less convinced that we want this as a general interface on `StringRef`. 
> I think callers need to decide whether something is similar enough or not. 
> For example, case sensitivity may not matter for this case but it might 
> matter to another case. Others may wish to limit the max edit_distance for 
> each of the candidates for performance reasons, others may not. We already 
> saw there were questions about whether to allow replacements or not. That 
> sort of thing.
> 
> I think this functionality should be moved to PPDirectives.cpp as a static 
> helper function that's specific to this use. Also, because this returns one 
> of the candidates passed in, and those are a `StringRef`, I think this 
> function should return a `StringRef` -- this removes allocation costs. I'd 
> also drop the `Dist` parameter as the function is no longer intended to be a 
> general purpose one.
I am also in favor of it, but where should I put tests for this functionality?
Is it possible to write unit tests for static functions implemented in `.cpp` 
files?


================
Comment at: llvm/lib/Support/StringRef.cpp:118-135
+  std::vector<std::pair<std::string, size_t>> Cand;
+  for (StringRef C : Candidates) {
+    size_t CurDist = edit_distance(C, true);
+    if (CurDist <= MaxDist) {
+      Cand.emplace_back(C, CurDist);
+    }
+  }
----------------
aaron.ballman wrote:
> I think we can skip the vector entirely by keeping track of the min edit 
> distance we've seen thus far and which candidate that corresponds to when 
> looping over the candidates.
:+1:


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124726/new/

https://reviews.llvm.org/D124726

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to