sammccall accepted this revision.
sammccall added a comment.
This revision is now accepted and ready to land.

Thanks for this! Mostly just some cleanups/comment nits left.

The substantive change is around the `\` and `  ` at end-of-line. It looks like 
the goal is on improving the parsing of markdown-formatted comments, but we 
need to focus as well on non-markdown-formatted comments as these are the 
majority case. (Most of your patch improves these a lot!)

If you don't have commit access yet, I'm happy to land this for you, just let 
me know when ready.
(A couple of patches like this are good grounds to apply for commit access, you 
can just link to the review threads)



================
Comment at: clang-tools-extra/clangd/Hover.cpp:530
+
+  return NextNonSpaceCharIndex != llvm::StringRef::npos &&
+         Str[NextNonSpaceCharIndex] == '\n';
----------------
We can make use of some StringRef helpers to make this more readable.
I think this is equivalent (to the whole function body):

```
return Str.substr(LineBreakIndex + 1).drop_while(isWhitespace).startswith("\n");
```


================
Comment at: clang-tools-extra/clangd/Hover.cpp:534
+
+bool isMardkownLineBreak(llvm::StringRef Str, size_t LineBreakIndex) {
+  return LineBreakIndex > 1 &&
----------------
I'm not altogether happy with these rules: many comments are not markdown, and 
these don't seem very safe for non-markdown comments.

Comments ending in a trailing slash may be explicitly indicating a **soft** 
line break e.g.

```
// Example usage:
// my_prog --some_flag = foo \
//    --remote_database=/path/to/whatever \
//    input.txt
```

and trailing whitespace is often just sloppiness.

I did a search over a large corpus of open-source C++.
 - for `//.*\ \ $`: Of a random 10 results, none appear to be markdown. 
 - for `//.*\\$`: Of a random 10 results, none were markdown. (4 were explicit 
soft breaks, 4 were ascii art, 2 were soft breaks in commented-out macro 
definitions)


================
Comment at: clang-tools-extra/clangd/Hover.cpp:535
+bool isMardkownLineBreak(llvm::StringRef Str, size_t LineBreakIndex) {
+  return LineBreakIndex > 1 &&
+         (Str[LineBreakIndex - 1] == '\\' ||
----------------
What about:
```
StringRef Prev = Str.substr(0, LineBreakIndex);
return Prev.endswith("\\") || Prev.endswith("  ")
```

(this seems more readable, and fixes the bug if LineBreakIndex == 1 and the 
string is {backslash, newline, ...}.)


================
Comment at: clang-tools-extra/clangd/Hover.cpp:544
+  return LineBreakIndex > 0 &&
+         Punctuation.find(Str[LineBreakIndex - 1]) != llvm::StringRef::npos;
+};
----------------
nit: Punctuation.contains(...)


================
Comment at: clang-tools-extra/clangd/Hover.cpp:733
 
+// Tries to retain hard line breaks and drops soft line breaks.
+void parseLineBreaks(llvm::StringRef Input, markup::Document &Output) {
----------------
nit: please put the doc on the declaration rather than the definition.

(I think the previous name was more appropriate to the signature, but up to you)


================
Comment at: clang-tools-extra/clangd/Hover.cpp:750
+
+      if (isParagraphLineBreak(TrimmedInput, CharIndex)) {
+        // TODO this should add an actual paragraph (double linebreak)
----------------
rather than repeating the code twice, please just use if isParagraph() || 
isHard(), with the comment that we may treat paragraph differently


================
Comment at: clang-tools-extra/clangd/Hover.cpp:751
+      if (isParagraphLineBreak(TrimmedInput, CharIndex)) {
+        // TODO this should add an actual paragraph (double linebreak)
+        Output.addParagraph().appendText(CurrentLine);
----------------
nit: llvm tends to use "// FIXME: "


================
Comment at: clang-tools-extra/clangd/Hover.cpp:751
+      if (isParagraphLineBreak(TrimmedInput, CharIndex)) {
+        // TODO this should add an actual paragraph (double linebreak)
+        Output.addParagraph().appendText(CurrentLine);
----------------
sammccall wrote:
> nit: llvm tends to use "// FIXME: "
please soften this to "maybe".
(Because we need to consider the padding/css issues of doing so, whether a hard 
break vs double break reflects different intents or just different conventions 
in different codebases, and whether we have a reasonable way to implement this 
- paragraphs have leading as well as trailing padding, so it gets complicated 
fast).

and rephrase to "should distinguish between line breaks and paragraphs".
("Double linebreak" is confusing as it doesn't mean anything at the 
markup::Document level, and markdown paragraph isn't the same concept as a 
plaintext blank line, even if they are both encoded as "\n\n")


================
Comment at: clang-tools-extra/clangd/unittests/HoverTests.cpp:1891
+    llvm::StringRef ExpectedRenderPlainText;
+  } Cases[] = {{
+                   "foo\n\n\nbar",
----------------
please add tests starting/ending with newline as these are special cases in the 
code


================
Comment at: clang-tools-extra/clangd/unittests/HoverTests.cpp:1943
+                   "foo\n*bar",
+                   // TODO `*` should probably not be escaped after line break
+                   "foo  \n\\*bar",
----------------
please drop these comments, as they don't seem to be true (unless assuming the 
content is markdown, which it usually isn't)

I think we can drop most of the tests too (for specific characters after 
newline), as we already have escaping tests and these don't actually test 
different linebreaking behaviour.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76094/new/

https://reviews.llvm.org/D76094



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to