sammccall added a comment. I realize many of the things I'll object to came from my own prototype, sorry about that :-\ I think/hope I gave you some forewarning about this!
================ Comment at: clang-tools-extra/clangd/Headers.h:137 + // Maps including files (from) to included files (to). + using AbstractIncludeGraph = llvm::DenseMap<unsigned, SmallVector<unsigned>>; + // Decides usage for each file included by EntryPoint based on the set of ---------------- Hey, I recognize this code :-) I think the key ideas here are that: - we're using opaque identifiers for the files, nothing is interesting about a file other than its (include) edges and (directly-referenced) color - the identity of headers is an impl detail of Headers.h rather than being something like a FileID, this allows us to hide messy details of files not having stable identity across preamble->main file I think the second idea is important, but the first one might be a bit naive. I worry it's going to lead to certain rules being hard to implement, or being bundled into Headers.cpp instead of IncludeCleaner. For example: - if a file is not self-contained, how does this affect the algorithm? (There's a FIXME for this, but it's in the wrong file!) - if a file is a standard library entrypoint? - if a file is a standard library impl detail? I think the facts (is a file self-contained) should be part of IncludeStructure, but that we should expose them for IncludeCleaner to deal with, rather than trying to hide them in `markUsed`. This means giving IncludeStructure a wider interface, which is we need to be careful about. It makes it harder to substitute algorithms by swapping out the UsedFunc, but I think this is only a cute trick and not actually important. --- Concretely, I think I'd suggest just extending the public API to expose the "file index" concept: - expose the type `using IncludeStructure::File = unsigned` or so - add the file id to `Inclusion` - add File getFile(const FileEntry*) - add `ArrayRef<File> getIncludedFiles(File)` - in future, we can add e.g. `const char* isStandardLibraryEntrypoint(File)` or whatever And implement all of markUsed in IncludeCleaner. (Not 100% sure if we actually still need the `Used` ivar in Inclusion, come to think of it, maybe we just run this code in the diagnostic cycle) ================ Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:144 + std::vector<SourceLocation> Sorted{Locs.begin(), Locs.end()}; + // Group by FileID. + llvm::sort(Sorted); ---------------- nit: move this to be a line comment on the sort() call? I think it's sufficiently nonobvious that sorting groups by file ID that it becomes nonobvious exactly what code the comment refers to! ================ Comment at: clang-tools-extra/clangd/IncludeCleaner.h:50 +/// Retrieves IDs of all files containing SourceLocations from \p Locs. Those +/// locations could be within macro expansions and are not resolved to their +/// spelling locations. ---------------- so when *do* we perform this expansion? Seems like you've wired this up end-to-end in this patch and we're just going to hit the elog case. I think it's reasonable to put this expansion into findReferencedFiles fwiw, it's a fairly simple second pass and in practice combining them won't interfere with reasonable tests ================ Comment at: clang-tools-extra/clangd/IncludeCleaner.h:51 +/// locations could be within macro expansions and are not resolved to their +/// spelling locations. +llvm::DenseSet<FileID> findReferencedFiles(const ReferencedLocations &Locs, ---------------- comment just says spelling, not spelling/expansion, not sure if this is significant. Expansion is actually the more obvious, but I do think we need both. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D108194/new/ https://reviews.llvm.org/D108194 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits