Timm =?utf-8?q?Bäder?= <tbae...@redhat.com>, Timm =?utf-8?q?Bäder?= <tbae...@redhat.com> Message-ID: In-Reply-To: <llvm/llvm-project/pull/66514/cl...@github.com>
================ @@ -0,0 +1,77 @@ + +#include "clang/Frontend/CodeSnippetHighlighter.h" +#include "clang/Basic/DiagnosticOptions.h" +#include "clang/Basic/SourceManager.h" +#include "clang/Lex/Lexer.h" +#include "clang/Lex/Preprocessor.h" +#include "clang/Lex/PreprocessorOptions.h" +#include "llvm/Support/raw_ostream.h" + +using namespace clang; + +static SourceManager createTempSourceManager() { + FileSystemOptions FileOpts; + FileManager FileMgr(FileOpts); + llvm::IntrusiveRefCntPtr<DiagnosticIDs> DiagIDs(new DiagnosticIDs()); + llvm::IntrusiveRefCntPtr<DiagnosticOptions> DiagOpts(new DiagnosticOptions()); + DiagnosticsEngine diags(DiagIDs, DiagOpts); + return SourceManager(diags, FileMgr); +} + +static Lexer createTempLexer(llvm::MemoryBufferRef B, SourceManager &FakeSM, + const LangOptions &LangOpts) { + return Lexer(FakeSM.createFileID(B), B, FakeSM, LangOpts); +} + +std::vector<StyleRange> CodeSnippetHighlighter::highlightLine( + StringRef SourceLine, const Preprocessor *PP, const LangOptions &LangOpts) { + if (!PP) + return {}; + constexpr raw_ostream::Colors CommentColor = raw_ostream::BLACK; + constexpr raw_ostream::Colors LiteralColor = raw_ostream::GREEN; + constexpr raw_ostream::Colors KeywordColor = raw_ostream::YELLOW; + + SourceManager FakeSM = createTempSourceManager(); + const auto MemBuf = llvm::MemoryBuffer::getMemBuffer(SourceLine); + Lexer L = createTempLexer(MemBuf->getMemBufferRef(), FakeSM, LangOpts); + L.SetKeepWhitespaceMode(true); ---------------- zygoloid wrote: While I think re-lexing the input to find the tokens is the right approach, starting with the source line in isolation is going to do the wrong thing in a lot of cases. For example, a format string warning inside a multi-line raw string literal will get bad highlighting due to not taking the initial lexing state for the line into account. But equally, re-lexing the entire file seems like it's going to be problematic from a performance perspective. I can think of a few alternatives here: 1) We could make the regular lexing process keep track of some of the lines where the lexer is in its "normal" state at the start of the line -- whenever we're in the normal lexing state at the start of a line, add the line number to a per-file list if it's been "long enough" (maybe >1K of program text?) since we last did so. Then when emitting diagnostics, we can find the most recent line where we were at a good state at the start of the line, and lex forward from there to drive syntax highlighting. 2) We could make the diagnostics layer keep a cache of the tokenized forms of buffers for which we emit diagnostics. We'd still re-lex an entire file if we emit diagnostics within it, but we'd only do so *once*, and we don't need to store the full list of tokens, only a list of (offset, color) pairs for transitions between token kinds. Thoughts? https://github.com/llvm/llvm-project/pull/66514 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits