sammccall added a comment.

Implementation looks good. I can't see a better way to solve this problem, it's 
just a bit unfortunate to have a sophisticated solution but not be able to turn 
it on by default.

I think naming is important here: it's a fairly complicated feature that (I 
suppose) relatively few will use, so having an unambiguous way to refer to it 
e.g. in docs will reduce the cost/confusion.
I suggested "QueryDriver" below, but we might be able to come up with something 
better offline.



================
Comment at: clang-tools-extra/clangd/ClangdServer.h:127
+
+    /// If set clangd will execute compiler drivers matching the following 
regex
+    /// to fetch system include path.
----------------
not a regex


================
Comment at: clang-tools-extra/clangd/ClangdServer.h:127
+
+    /// If set clangd will execute compiler drivers matching the following 
regex
+    /// to fetch system include path.
----------------
sammccall wrote:
> not a regex
I think this probably wants to be a vector<string> (comma-separated on 
command-line)?

Clangd-level settings aren't easy to customize per-project.


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:65
 
+std::vector<std::string> parseDriverOutput(llvm::StringRef Output) {
+  std::vector<std::string> SystemIncludes;
----------------
Can we move the implementation out to a separate file?
It's pretty logically separate, and about as much code as everything else 
together.
It could also use a file-level comment describing how the scheme works, and 
also the reasons it's not always on (security, and not all argv0's are 
gcc-compatible)
(Sharing the same header seems fine)


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:71
+  Output.split(Lines, '\n', /*MaxSplit=*/-1, /*KeepEmpty=*/false);
+  bool FoundStart = false;
+  for (llvm::StringRef Line : Lines) {
----------------
I'd consider `std::search` for the delimiters explicitly - I think it's clearer 
what happens when you find one but not the other.

(And you probably want to log an error and do nothing unless you find both)


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:82
+
+    if (!llvm::sys::fs::exists(Line))
+      elog("Parsed non-existing system include: {0}", Line);
----------------
Is this check important? (What happens if non-existent dirs are on the include 
path?)

If it is, maybe we should pass in the VFS (factory?) here.


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:90
+
+std::vector<std::string> extractSystemIncludes(PathRef Driver, PathRef File) {
+  trace::Span Tracer("Extract system includes");
----------------
some explanation of what this is doing?

e.g. "Many compilers have default system includes... these are known by the 
driver and passed to cc1 ... gcc and compilers with compatible CLI can dump the 
cc1 invocation information with this syntax... we do so and parse the result."


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:96
+  llvm::SmallString<128> OutputPath;
+  const auto EC = llvm::sys::fs::createTemporaryFile("system-includes",
+                                                     "clangd", OutputPath);
----------------
nit: please don't bother with const on locals (style is debatable, but it's 
hardly used anywhere and adds little value unless used consistently)


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:96
+  llvm::SmallString<128> OutputPath;
+  const auto EC = llvm::sys::fs::createTemporaryFile("system-includes",
+                                                     "clangd", OutputPath);
----------------
sammccall wrote:
> nit: please don't bother with const on locals (style is debatable, but it's 
> hardly used anywhere and adds little value unless used consistently)
we need to remove this file at some point


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:99
+  if (EC) {
+    elog("Couldn't create temporary file: {0}", EC.message());
+    return {};
----------------
log messages should contain more context, e.g. "Driver flags extraction: failed 
to create temporary file"


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:108
+  // Should we also preserve flags like "-sysroot", "-nostdinc" ?
+  const llvm::StringRef Args[] = {"-E", "-x", driver::types::getTypeName(Type),
+                                  "-", "-v"};
----------------
this will crash if the lookup didn't provide a "real" type


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:112
+  const int RC =
+      llvm::sys::ExecuteAndWait(Driver, Args, /*Env=*/llvm::None, Redirects);
+  if (RC) {
----------------
should we check the driver exists before executing it? the main advantage would 
be that we avoid logging this as if it were an error (or we do so with a better 
error message)


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:125
+
+  return parseDriverOutput(BufOrError->get()->getBuffer());
+}
----------------
I think we should log the success case too... remember this is only going to 
happen once per driver (or {driver, filetype}).


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:129
+tooling::CompileCommand
+addSystemIncludes(tooling::CompileCommand Cmd,
+                  llvm::ArrayRef<std::string> SystemIncludes) {
----------------
just take cmd by reference and modify it?


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:140
+llvm::Regex convertGlobToRegex(llvm::StringRef Glob) {
+  const llvm::StringRef MetaChars("()^$|*+?.[]\\{}");
+  std::string RegText;
----------------
please use llvm::Regex::escape() on the chunks between `*`, so we're sure to be 
consistent


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:178
+            convertGlobToRegex(TrustedCompilerDriverGlob)),
+        Base(std::move(Base)) {
+    assert(this->Base && "Base cannot be null!");
----------------
I think you need to subscribe to changes in the base and broadcast them to your 
own subscribers.
As it stands this breaks background indexing I think.
This pattern is error prone, as it's too easy to forget :-(


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:190
+
+    llvm::StringRef Driver = Cmd->CommandLine.front();
+
----------------
check the CommandLine isn't empty? (I know, silly...)


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:190
+
+    llvm::StringRef Driver = Cmd->CommandLine.front();
+
----------------
sammccall wrote:
> check the CommandLine isn't empty? (I know, silly...)
The flag values will be absolute paths (or must be, to make any security 
sense), so we need to make Driver absolute before checking against it (using 
the command's working directory). And we need to make sure the string we exec 
is the same as the one we check, so that needs to be absolute too.


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:195
+      std::lock_guard<std::mutex> Lock(Mu);
+      if (!TrustedCompilerDriverRegex.match(Driver))
+        return Cmd;
----------------
regex check is part of the cacheable computation.


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:198
+
+      auto It = DriverToIncludes.try_emplace(Driver);
+      if (It.second)
----------------
you're computing based on the filename, but caching based only on the driver.

If the filename is important, I think we should probably restructure so you get 
the filetype here and cache by it instead.


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:210
+  // Caches includes extracted from a driver.
+  mutable llvm::StringMap<std::vector<std::string>> DriverToIncludes;
+  mutable llvm::Regex TrustedCompilerDriverRegex;
----------------
put "cache" in the name?


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.h:96
+/// the compile flags.
+/// Returns nullptr when \p TrustedCompilerDriverGlob is empty or \p Base is
+/// nullptr.
----------------
If the trusted list is empty, it seems reasonable just to return Base. Then the 
caller can skip the check.


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.h:96
+/// the compile flags.
+/// Returns nullptr when \p TrustedCompilerDriverGlob is empty or \p Base is
+/// nullptr.
----------------
sammccall wrote:
> If the trusted list is empty, it seems reasonable just to return Base. Then 
> the caller can skip the check.
if null base is forbidden (seems to be the intent here), just say so, don't 
specify what happens in that case (and probably add an assert)


================
Comment at: clang-tools-extra/clangd/tool/ClangdMain.cpp:271
 
+static llvm::cl::opt<std::string> TrustedCompilerDriverGlob(
+    "trusted-compiler-driver-glob",
----------------
I'm not sure this is the right name, because "trusted" doesn't indficate what 
the feature does or when you might need it. And we've mitigated most of the 
security risk by turning this off by default, I'm not sure we actually need a 
warning in the flag name.

Maybe `-query-driver=`?

(commenting here because it's the main user-visible documentation, but I think 
we should use consistent names throughout, e.g. QueryDriverGlob).


================
Comment at: clang-tools-extra/clangd/tool/ClangdMain.cpp:273
+    "trusted-compiler-driver-glob",
+    llvm::cl::desc("Tells clangd to extract system includes from drivers "
+                   "maching the glob. Only accepts ** for sequence match and * 
"
----------------
nit: drop "tells clangd to", it's implied


================
Comment at: clang-tools-extra/clangd/tool/ClangdMain.cpp:273
+    "trusted-compiler-driver-glob",
+    llvm::cl::desc("Tells clangd to extract system includes from drivers "
+                   "maching the glob. Only accepts ** for sequence match and * 
"
----------------
sammccall wrote:
> nit: drop "tells clangd to", it's implied
from gcc-compatible drivers


================
Comment at: clang-tools-extra/clangd/tool/ClangdMain.cpp:274
+    llvm::cl::desc("Tells clangd to extract system includes from drivers "
+                   "maching the glob. Only accepts ** for sequence match and * 
"
+                   "for non-sequence match."),
----------------
I think it's more useful to give an example (/usr/bin/**/clang-*) than try to 
describe glob syntax.


================
Comment at: clang-tools-extra/clangd/tool/ClangdMain.cpp:276
+                   "for non-sequence match."),
+    llvm::cl::init(""));
+
----------------
is there something useful we should set this to by default? like 
`/usr/bin/gcc,/usr/bin/g++`?

Or is the assumption that the standard/system compilers will never have weird 
information to extract?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62804/new/

https://reviews.llvm.org/D62804



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to