ioeric added a comment. Looks great! Thanks for doing this!
The algorithm looks pretty efficient. It might still make sense to collect some performance numbers for real files with potentially large #include tree (we have plenty of those in our internal codebase :) ================ Comment at: clangd/CodeComplete.cpp:908 + llvm::Optional<IncludeInserter> Inserter; // Available during runWithSema. + llvm::Optional<URIDistance> FileProximity; // Initialized once Sema runs. ---------------- It might make sense to briefly explain somewhere how computational cost is distributed across the workflow. ================ Comment at: clangd/CodeComplete.cpp:938 + SemaCCInput.FileName, SemaCCInput.Contents, + format::getLLVMStyle(), // XXX + SemaCCInput.Command.Directory, ---------------- nit: `Style`? ================ Comment at: clangd/FileDistance.cpp:72 + // Fill in all ancestors between the query and the known. + for (unsigned I = 0; I < Ancestors.size(); ++I) { + if (Cost != kUnreachable) ---------------- nit: this loop would probably be easier to understand if iterated reversely. ================ Comment at: clangd/FileDistance.cpp:86 + return R.first->getSecond(); + if (auto U = clangd::URI::parse(URI)) { + LLVM_DEBUG(dbgs() << "distance(" << URI << ") = distance(" << U->body() ---------------- This is done for every index symbol, so I wonder if we could avoid parsing URIs by assuming some URI structures. ================ Comment at: clangd/FileDistance.h:56 + + FileDistance(llvm::StringMap<unsigned> Roots, + const FileDistanceOptions &Opts = {}); ---------------- nit: It's unclear what the values of Roots are (it can be confused with tree roots or directory roots). ================ Comment at: clangd/Headers.cpp:28 + : SM(SM), Out(Out) { + SM.getMainFileID(); + } ---------------- What does this do? ================ Comment at: clangd/Headers.h:62 + std::vector<Inclusion> MainFileIncludes; + llvm::StringMap<unsigned> includeDepth(llvm::StringRef MainFileName) const; + ---------------- Does this return all transitive includes in the entire TU or just those from `MainFileName`? And what's `MainFileName` here? Is it the same as the main file in a TU? ================ Comment at: clangd/Headers.h:67 + llvm::StringRef IncludedName, + llvm::StringRef IncludedRealName); + ---------------- The real name can be empty. How do we handle empty path? ================ Comment at: clangd/Headers.h:75 + // The paths we want to expose are the RealPathName, so store those too. + unsigned fileNumber(llvm::StringRef Name); + llvm::StringMap<unsigned> NameToIndex; // Values are file numbers. ---------------- Maybe `fileIndex`? ================ Comment at: clangd/Quality.cpp:233 +std::pair<float, unsigned> proximityScore(llvm::StringRef SymbolURI, + URIDistance *D) { ---------------- nit: `static` ================ Comment at: unittests/clangd/FileDistanceTests.cpp:36 + // Ancestor (up+up+up+up) + EXPECT_EQ(D.distance("/"), 22u); + // Sibling (up+down) ---------------- I think this is an interesting case. IIUC, 22u is contributed by 4 ups (4*5) and 1 include (1*2, via `llvm/ADT/StringRef.h`). Suppose `a/b/c/d/e/f.cc` includes `base/x.h`, then the shortest path into directory `other/` is likely to go via `base/x.h`. This might become a problem if `base/x.h` is very commonly used. One solution is probably to penalize edits that hit the root. Repository: rCTE Clang Tools Extra https://reviews.llvm.org/D48441 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits