ilya-biryukov added a comment.

Probably the most important suggestion from my side is trying to clearly 
separate the `enqueue` calls (which actually schedule rebuilds of TUs) using 
clang and the `loadShards` function.
The index loading should be fast, so I assume we won't win much latency by 
scheduling the indexing actions in the middle of the shard-loading.
OTOH, separating those two concerns should make the code way more readable (I 
think, at least).

The concrete proposal is to make `loadShards` actually **only** the shards and 
return a list of TUs that still need to be rebuilt. As the second step, we can 
`enqueue` the rebuild of those TUs.

Another important thing that's worth doing is documenting the correctness 
trade-offs we have in the current model, i.e. when following the include edges 
of a file (say, `foo.h`) that had its digest changed, we can potentially:

1. load stale symbols for some other file (say, `bar.h`) previously included by 
it,
2. never update those symbols to fresh ones if the include edge was dropped, 
i.e. `foo.h` does not include `bar.h` anymore.

The described situation can lead to stale symbols for `bar.h` being kept 
indefinitely in some cases, but has the advantage of reparsing less TUs when 
rebuilding the index. Overall it's definitely fine to have this trade-off for 
the time-being, but would be nice if we could document it.

The rest is mostly NITs and code style comments.

And thanks for the change, it's an impressive piece of work!



================
Comment at: clangd/SourceCode.h:96
+
+// Handle the symbolic link path case where the current working directory
+// (getCurrentWorkingDirectory) is a symlink./ We always want to the real
----------------
We might want to have a guidance on when this function should be used, similar 
to `getRealPath`. Also mention why it's different from the above.

My understanding is that it's only used for canonicalizing the file path for 
storing them in the index and nowhere else.
Should we discourage usages outside the index? Maybe we can give a more obscure 
name for this purpose, e.g. `canonicalizeForIndex` or similar?



================
Comment at: clangd/SourceCode.h:107
+// "/tmp/build/foo.h"
+std::string makeCanonicalPath(llvm::StringRef AbsolutePath,
+                              const SourceManager &SM);
----------------
NIT: maybe name it `canonicalizePath`? A bit shorter. But up to you


================
Comment at: clangd/SourceCode.h:107
+// "/tmp/build/foo.h"
+std::string makeCanonicalPath(llvm::StringRef AbsolutePath,
+                              const SourceManager &SM);
----------------
ilya-biryukov wrote:
> NIT: maybe name it `canonicalizePath`? A bit shorter. But up to you
Maybe put this function right after `getRealPath`? They both "canonicalize" 
paths, so it makes sense for them to live together.


================
Comment at: clangd/index/Background.cpp:90
+
+llvm::SmallString<128> getAbsolutePath(const tooling::CompileCommand &Cmd) {
+  llvm::SmallString<128> AbsolutePath;
----------------
Why not `vfs->makeAbsolute` instead of this function? Maybe worth a comment?


================
Comment at: clangd/index/Background.cpp:189
+      [this, Storage](tooling::CompileCommand Cmd) {
+        Cmd.CommandLine.push_back("-resource-dir=" + ResourceDir);
+        const std::string FileName = Cmd.Filename;
----------------
Ah, it feels we should move this into a helper function and reuse everywhere.
It's a small detail, but it's so easy to forget about it...

Just complaining, no need to do anything about it...


================
Comment at: clangd/index/Background.cpp:259
+      // if this thread sees the older version but finishes later. This should
+      // be rare in practice.
+      DigestIt.first->second = Hash;
----------------
> "should be rare in practice"
And yet, can we avoid this altogether?

Also, I believe it won't be rare. When processing multiple different TUs, we 
can easily run into the same header multiple times, possibly with the different 
contents.
E.g. imagine the index for the whole of clang is being built and the user is 
editing `Sema.h` at the same time. We'll definitely be running into this case 
over and over again.


================
Comment at: clangd/index/Background.cpp:309
     if (std::error_code EC =
             SM.getFileManager().getVirtualFileSystem()->makeAbsolute(AbsPath)) 
{
       elog("Warning: could not make absolute file: {0}", EC.message());
----------------
Does it make sense for it to be part of `makeCanonicalPath`?
Both call sites call `vfs->getAbsolutePath` before calling the function?


================
Comment at: clangd/index/Background.cpp:338
     std::lock_guard<std::mutex> Lock(DigestsMu);
-    if (IndexedFileDigests.lookup(AbsolutePath) == Hash) {
-      vlog("No need to index {0}, already up to date", AbsolutePath);
----------------
This looks like an important optimization. Did we move it to some other place? 
Why should we remove it?


================
Comment at: clangd/index/Background.cpp:365
   StringMap<FileDigest> FilesToUpdate;
+  // Main file always needs to be updated even if it has no symbols.
+  FilesToUpdate[AbsolutePath] = Hash;
----------------
The comment does not mention **why** should we update it. Is it because of the 
dependencies?


================
Comment at: clangd/index/Background.cpp:386
   Action->EndSourceFile();
+  if (!Index.Symbols)
+    return createStringError(inconvertibleErrorCode(), "Uncompilable errors");
----------------
Was this error possible before this change too?


================
Comment at: clangd/index/Background.cpp:389
 
-  log("Indexed {0} ({1} symbols, {2} refs, {3} files)",
-      Inputs.CompileCommand.Filename, Index.Symbols->size(),
-      Index.Refs->numRefs(), Index.Sources->size());
+  vlog("Indexed {0} ({1} symbols, {2} refs, {3} files)",
+       Inputs.CompileCommand.Filename, Index.Symbols->size(),
----------------
This log is not getting called more often after this change, right?
Could you send the `log -> vlog` replacement as a separate patch to keep this 
change more focused? 


================
Comment at: clangd/index/Background.cpp:412
+  // Consumes Symbols and Refs in Shard.
+  auto LoadShard = [this](llvm::StringRef AbsolutePath, IndexFileIn *Shard,
+                          const IncludeGraphNode &IGN) {
----------------
NIT: maybe rename to `AddShardToIndex` or `AddShard`? `LoadShard` is too 
similar to `IndexStorage::loadShard`, which confused me a little when going 
through the code for the first time.


================
Comment at: clangd/index/Background.cpp:481
+          if (!Buf)
+            continue;
+          // If digests match then dependency doesn't need re-indexing.
----------------
maybe log the error? would be nice to somehow recover too, but not sure what we 
can do here.


================
Comment at: clangd/index/Background.cpp:500
+  // Keeps track of the loaded shards to make sure we don't perform redundant
+  // disk io. Keys are AbsolutePaths.
+  llvm::StringSet<> LoadedShards;
----------------
NIT: `s/io/IO`
NIT: `s/AbsolutePaths/absolute paths` (otherwise looks like we're referring to 
a local variable or a field with this name)


================
Comment at: clangd/index/Background.cpp:502
+  llvm::StringSet<> LoadedShards;
+  for (const auto &File : ChangedFiles) {
+    ProjectInfo PI;
----------------
We used to shuffle the `ChangedFiles` before processing them. I believe the 
reason for doing that are still relevant. Should we restore this? (Or are we 
doing it somewhere else now and I missed it?)


================
Comment at: clangd/index/Background.cpp:508
+    BackgroundIndexStorage *IndexStorage = IndexStorageFactory(PI.SourceRoot);
+    auto Dependencies = loadShard(*Cmd, IndexStorage, LoadedShards);
+    for (const auto &Dependency : Dependencies) {
----------------
Maybe return a `vector<string>` with dependencies that need reindexing?  We 
ignore the dependencies that don't need to be reindexed anyway.


================
Comment at: clangd/index/Background.cpp:511
+      // FIXME: Currently, we simply schedule indexing on a TU whenever any of
+      // its dependencies needs re-indexing. We might do it more smartly by
+      // figuring out a minimal set of TUs that will cover all the stale
----------------
NIT: `s/more smartly/smarter`


================
Comment at: clangd/index/Background.cpp:514
+      // dependencies.
+      if (Dependency.second && !BeingReindexed.count(Dependency.first)) {
+        vlog("Enqueueing {0} for {1}", Cmd->Filename, Dependency.first);
----------------
What is `Dependency.second` and `Dependency.first`? Maybe use a named struct 
instead of a pair here or deconstruct into named variables?


================
Comment at: clangd/index/Background.cpp:519
+        // try to re-index those.
+        std::for_each(Dependencies.begin(), Dependencies.end(),
+                      [&BeingReindexed](decltype(Dependency) Dependency) {
----------------
Maybe use range-based-for instead? Looks simpler:
```
for (auto Dep : Dependencies)
  BeingReindexed.insert(Dep.first);
```


================
Comment at: clangd/index/Background.h:108
   BackgroundIndexStorage::Factory IndexStorageFactory;
+  // Loads the shards for all the sources and headers of the Cmd. Returns the
+  // list of dependencies for this Cmd and whether they need to be re-indexed.
----------------
Could you elaborate what are the "sources and headers of the Cmd"? The compile 
command is typically created for a **single** source file, so this comment 
looks a bit confusing.


================
Comment at: clangd/index/Background.h:111
+  std::vector<
+      std::pair</*Path to dependency*/ std::string, /*NeedsReIndexing*/ bool>>
+  loadShard(const tooling::CompileCommand &Cmd,
----------------
Consider creating a simple struct for this pair. The named field access is much 
more readable than `.first` and `.second`


================
Comment at: clangd/index/Background.h:114
+            BackgroundIndexStorage *IndexStorage,
+            llvm::StringSet<> &FailedShards);
+  llvm::Error loadShards(std::vector<std::string> ChangedFiles);
----------------
Should this be named `VisitedShards`?


================
Comment at: clangd/index/BackgroundIndexStorage.cpp:80
     OS.close();
+    vlog("Written shard {0} - {1}", ShardPath, ShardIdentifier);
     return llvm::errorCodeToError(OS.error());
----------------
NIT: maybe send in as a separate patch to keep this more focused? (No need to 
send for review, just commit it)


Repository:
  rCTE Clang Tools Extra

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55224/new/

https://reviews.llvm.org/D55224



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to