kadircet added inline comments.

================
Comment at: clangd/index/Background.cpp:259
+      // if this thread sees the older version but finishes later. This should
+      // be rare in practice.
+      DigestIt.first->second = Hash;
----------------
ilya-biryukov wrote:
> kadircet wrote:
> > ilya-biryukov wrote:
> > > kadircet wrote:
> > > > ilya-biryukov wrote:
> > > > > kadircet wrote:
> > > > > > ilya-biryukov wrote:
> > > > > > > > "should be rare in practice"
> > > > > > > And yet, can we avoid this altogether?
> > > > > > > 
> > > > > > > Also, I believe it won't be rare. When processing multiple 
> > > > > > > different TUs, we can easily run into the same header multiple 
> > > > > > > times, possibly with the different contents.
> > > > > > > E.g. imagine the index for the whole of clang is being built and 
> > > > > > > the user is editing `Sema.h` at the same time. We'll definitely 
> > > > > > > be running into this case over and over again.
> > > > > > Well I am open to ideas, but currently there is no way of knowing 
> > > > > > whether this is the newer version for the file. Because only 
> > > > > > information we have is the digest is different than what we 
> > > > > > currently have and this doesn't yield any chronological information.
> > > > > Do we know which request is "newer" when scheduling it?
> > > > > If so, we could keep the map of the **latest** hashes of the files 
> > > > > we've seen (in addition to the `IndexedFileDigests`, which correspond 
> > > > > to the digest for which we built the index IIUC).
> > > > > 
> > > > > This would give us a simple invariant of the final state we want to 
> > > > > bring the index to: `IndexedFileDigests` should correspond to the 
> > > > > latest hashes seen so far. If not, we have to rebuild the index for 
> > > > > the corresponding files. That, in turn, gives us a way to resolve 
> > > > > conflicts: we should never replace with symbols built for the latest 
> > > > > version (hash) of the file we've seen.
> > > > > 
> > > > > Would that work?
> > > > I am not sure if it would work for non-main files. We update the Hash 
> > > > for the main files at each indexing action, so we can safely keep track 
> > > > of the latest versions. But we collect hashes for dependencies while 
> > > > performing the indexing which happens in parallel. Therefore an 
> > > > indexing action triggered earlier might get an up-to-date version of a 
> > > > dependency than an action triggered later-on.
> > > If updates for each TU are atomic (i.e. all files included in the TU are 
> > > updated in a single go) we would get the up-to-date index eventually, 
> > > assuming we rebuild the index when TU dependencies change (we don't 
> > > schedule rebuilds on changes to included header now, but we're planning 
> > > to do so at some point).
> > Sure, that assumption seems more relaxed than the previous one, just wanna 
> > make sure I got it right:
> > Your suggested solution assumes: Dependencies of a TU won't change during 
> > indexing of that TU
> > Whereas the previous assumption was: Files won't change between close 
> > indexing actions
> Exactly. The idea is that eventually we'll start tracking changes and will be 
> able to detect even those cases and rebuild the index accordingly. Just 
> trying to keep us from drifting away from that model too much.
> 
> > files won't change between close indexing actions
> This assumption worked fine for indexing actions running right away, but I 
> think the situation with loading shards is different: the shards we are 
> loading might be old and if we don't want a stale shard (which might be 
> arbitrarily old) to overwrite results of the fresh indexing action. Let me 
> know if I'm missing something here.
Nope, your point seems to be correct


================
Comment at: clangd/index/Background.cpp:312
+      // Skip if file is already up to date.
+      auto DigestIt = IndexedFileDigests.try_emplace(Path);
+      if (!DigestIt.second && DigestIt.first->second == Hash)
----------------
ilya-biryukov wrote:
> We still update digests and symbols non-atomically in that case. Could we do 
> both under the same lock, similar to how it's done in `loadShards`?
Moved update to the end of the loop, but now we might perform unnecessary work 
for building {symbol,ref}slabs for files that are already up-to-date. It 
shouldn't be too much of an issue, as a solution we can lock at the entrance 
and only check if the file was up-to-date, than we update at the exit. Or we 
can keep a snapshot.


Repository:
  rCTE Clang Tools Extra

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55224/new/

https://reviews.llvm.org/D55224



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to