[clang] [clang][Diagnostics] Highlight code snippets (PR #66514)

Richard Smith via cfe-commits Wed, 20 Sep 2023 10:48:40 -0700

Timm =?utf-8?q?Bäder?= <tbae...@redhat.com>,
Timm =?utf-8?q?Bäder?= <tbae...@redhat.com>
Message-ID:
In-Reply-To: <llvm/llvm-project/pull/66514/cl...@github.com>



================
@@ -0,0 +1,77 @@
+
+#include "clang/Frontend/CodeSnippetHighlighter.h"
+#include "clang/Basic/DiagnosticOptions.h"
+#include "clang/Basic/SourceManager.h"
+#include "clang/Lex/Lexer.h"
+#include "clang/Lex/Preprocessor.h"
+#include "clang/Lex/PreprocessorOptions.h"
+#include "llvm/Support/raw_ostream.h"
+
+using namespace clang;
+
+static SourceManager createTempSourceManager() {
+  FileSystemOptions FileOpts;
+  FileManager FileMgr(FileOpts);
+  llvm::IntrusiveRefCntPtr<DiagnosticIDs> DiagIDs(new DiagnosticIDs());
+  llvm::IntrusiveRefCntPtr<DiagnosticOptions> DiagOpts(new 
DiagnosticOptions());
+  DiagnosticsEngine diags(DiagIDs, DiagOpts);
+  return SourceManager(diags, FileMgr);
+}
+
+static Lexer createTempLexer(llvm::MemoryBufferRef B, SourceManager &FakeSM,
+                             const LangOptions &LangOpts) {
+  return Lexer(FakeSM.createFileID(B), B, FakeSM, LangOpts);
+}
+
+std::vector<StyleRange> CodeSnippetHighlighter::highlightLine(
+    StringRef SourceLine, const Preprocessor *PP, const LangOptions &LangOpts) 
{
+  if (!PP)
+    return {};
+  constexpr raw_ostream::Colors CommentColor = raw_ostream::BLACK;
+  constexpr raw_ostream::Colors LiteralColor = raw_ostream::GREEN;
+  constexpr raw_ostream::Colors KeywordColor = raw_ostream::YELLOW;
+
+  SourceManager FakeSM = createTempSourceManager();
+  const auto MemBuf = llvm::MemoryBuffer::getMemBuffer(SourceLine);
+  Lexer L = createTempLexer(MemBuf->getMemBufferRef(), FakeSM, LangOpts);
+  L.SetKeepWhitespaceMode(true);
----------------
zygoloid wrote:

While I think re-lexing the input to find the tokens is the right approach, 
starting with the source line in isolation is going to do the wrong thing in a 
lot of cases. For example, a format string warning inside a multi-line raw 
string literal will get bad highlighting due to not taking the initial lexing 
state for the line into account. But equally, re-lexing the entire file seems 
like it's going to be problematic from a performance perspective. I can think 
of a few alternatives here:

1) We could make the regular lexing process keep track of some of the lines 
where the lexer is in its "normal" state at the start of the line -- whenever 
we're in the normal lexing state at the start of a line, add the line number to 
a per-file list if it's been "long enough" (maybe >1K of program text?) since 
we last did so. Then when emitting diagnostics, we can find the most recent 
line where we were at a good state at the start of the line, and lex forward 
from there to drive syntax highlighting.

2) We could make the diagnostics layer keep a cache of the tokenized forms of 
buffers for which we emit diagnostics. We'd still re-lex an entire file if we 
emit diagnostics within it, but we'd only do so *once*, and we don't need to 
store the full list of tokens, only a list of (offset, color) pairs for 
transitions between token kinds.

Thoughts?

https://github.com/llvm/llvm-project/pull/66514
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang][Diagnostics] Highlight code snippets (PR #66514)

Reply via email to