kadircet added inline comments.

================
Comment at: clang-tools-extra/clangd/index/FileIndex.cpp:292
+    for (const auto &Sym : *Slab) {
+      if (Sym.Definition)
+        Files.insert(Sym.Definition.FileURI);
----------------
ArcsinX wrote:
> kadircet wrote:
> > it feels weird to choose one or the other here, why not just insert both 
> > (after checking if definition exists, ofc).
> > 
> > We are likely to have a merged symbol information anyway, and the logic 
> > here will likely result in no index owning the header files, unless there 
> > are some symbols *defined* (not just declared) in it.
> > 
> > This will likely result in some overshooting as knowing about a symbols 
> > declaration location doesn't mean indexing that particular file. But so 
> > does the current version, as knowing about definition location might be 
> > because of a merged symbol, rather than indexing of particular file.
> I will try to explain on example:
> 
> - **test.h**
> ```
> void f();
> ```
> - **test.c**
> ```
> #include "test.h"
> void f() { }
> ```
> - compile_commands.json contains a command for **test.c** compilation.
> 
> Scenario:
> - open **test.c**
> - try to find all references for `f()`
> 
> For that scenario result for `find all references` will be incorrect if both 
> (decl and def) files are in the file list because:
> - decl location is inside **test.h**
> - def location is inside **test.c**
> - the file list for the main file index contains **test.h** and **test.c**
> - the main file index does not contain references from **test.h**
> - the background (static) index contains references from **test.c**, but 
> results from the background index will be skipped, because **test.h** is in 
> the main file (dynamic) index file list.
Ah i see. This all seems really fragile :/ We might as well have something like:

a.h:
```
struct Bar;
```

a.cc:
```
#include "a.h"
struct Bar;
```

and now as soon as you open a.cc, all the results from a.h will be gone, 
because canonical declaration location for `Bar` will be in `a.h` and there is 
no definition. (and i believe this kind of forward-decl madness is quite common 
in at least LLVM). 
Even worse, the same will also happen even if you have definition in the 
header, but a forward decl in the main file, so even accepting the file for 
definition wouldn't be enough.

It is currently (i.e. without this patch) working as expected, as main file 
index only owns the information for the "main file" indexing was initiated for. 
This feels like a big regression to me (that I didn't notice initially, sorry 
for that), but I am ready to be convinced otherwise :D

As Sam mentioned what you do here and partitioning logic in FileShardedIndex is 
quite similar (yours undershoot, sharding logic overshoots) but in the sharding 
process we split indexing result of a full TU/preamble, and later on those 
shards will always be used in a merged fashion (e.g. when a.cc and b.cc, both 
including a.h gets indexed, the shard produced for a.h from b.cc won't contain 
any definition locations, but the shard produced for a.cc will know about those 
symbols definition locations and in a merged view those symbols will have all 
the necessary information.), whereas in here the resulting information is used 
in isolation (main file isn't merged with preamble symbols, but mixes Files 
view). Hence causing such regressions.

I think the proper thing to do here is to propagate relevant files with slabs 
on `FileSymbols::update`. What you do here and sharding isn't very different. 
The question is should we have a:
- string File
- optional<string> File
- vector<string> Files

Currently we always have exactly one File associated with all of those slabs as 
we either:
- always do sharding (preamble and background idx)
- even though symbol informations tell otherwise, we've only processed a single 
file (main file idx)

We would need 3rd option if we were to use filesymbols with a monolithic index, 
but we don't. And even if we need such a thing in future it shouldn't be a huge 
change hopefully.


Sorry for the wall of text, I hope I do make sense, but please tell me if I 
misunderstood/missing something.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97535/new/

https://reviews.llvm.org/D97535

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to