kadircet added inline comments.
================ Comment at: clangd/index/Background.cpp:259 + // if this thread sees the older version but finishes later. This should + // be rare in practice. + DigestIt.first->second = Hash; ---------------- ilya-biryukov wrote: > kadircet wrote: > > ilya-biryukov wrote: > > > kadircet wrote: > > > > ilya-biryukov wrote: > > > > > kadircet wrote: > > > > > > ilya-biryukov wrote: > > > > > > > > "should be rare in practice" > > > > > > > And yet, can we avoid this altogether? > > > > > > > > > > > > > > Also, I believe it won't be rare. When processing multiple > > > > > > > different TUs, we can easily run into the same header multiple > > > > > > > times, possibly with the different contents. > > > > > > > E.g. imagine the index for the whole of clang is being built and > > > > > > > the user is editing `Sema.h` at the same time. We'll definitely > > > > > > > be running into this case over and over again. > > > > > > Well I am open to ideas, but currently there is no way of knowing > > > > > > whether this is the newer version for the file. Because only > > > > > > information we have is the digest is different than what we > > > > > > currently have and this doesn't yield any chronological information. > > > > > Do we know which request is "newer" when scheduling it? > > > > > If so, we could keep the map of the **latest** hashes of the files > > > > > we've seen (in addition to the `IndexedFileDigests`, which correspond > > > > > to the digest for which we built the index IIUC). > > > > > > > > > > This would give us a simple invariant of the final state we want to > > > > > bring the index to: `IndexedFileDigests` should correspond to the > > > > > latest hashes seen so far. If not, we have to rebuild the index for > > > > > the corresponding files. That, in turn, gives us a way to resolve > > > > > conflicts: we should never replace with symbols built for the latest > > > > > version (hash) of the file we've seen. > > > > > > > > > > Would that work? > > > > I am not sure if it would work for non-main files. We update the Hash > > > > for the main files at each indexing action, so we can safely keep track > > > > of the latest versions. But we collect hashes for dependencies while > > > > performing the indexing which happens in parallel. Therefore an > > > > indexing action triggered earlier might get an up-to-date version of a > > > > dependency than an action triggered later-on. > > > If updates for each TU are atomic (i.e. all files included in the TU are > > > updated in a single go) we would get the up-to-date index eventually, > > > assuming we rebuild the index when TU dependencies change (we don't > > > schedule rebuilds on changes to included header now, but we're planning > > > to do so at some point). > > Sure, that assumption seems more relaxed than the previous one, just wanna > > make sure I got it right: > > Your suggested solution assumes: Dependencies of a TU won't change during > > indexing of that TU > > Whereas the previous assumption was: Files won't change between close > > indexing actions > Exactly. The idea is that eventually we'll start tracking changes and will be > able to detect even those cases and rebuild the index accordingly. Just > trying to keep us from drifting away from that model too much. > > > files won't change between close indexing actions > This assumption worked fine for indexing actions running right away, but I > think the situation with loading shards is different: the shards we are > loading might be old and if we don't want a stale shard (which might be > arbitrarily old) to overwrite results of the fresh indexing action. Let me > know if I'm missing something here. Nope, your point seems to be correct ================ Comment at: clangd/index/Background.cpp:312 + // Skip if file is already up to date. + auto DigestIt = IndexedFileDigests.try_emplace(Path); + if (!DigestIt.second && DigestIt.first->second == Hash) ---------------- ilya-biryukov wrote: > We still update digests and symbols non-atomically in that case. Could we do > both under the same lock, similar to how it's done in `loadShards`? Moved update to the end of the loop, but now we might perform unnecessary work for building {symbol,ref}slabs for files that are already up-to-date. It shouldn't be too much of an issue, as a solution we can lock at the entrance and only check if the file was up-to-date, than we update at the exit. Or we can keep a snapshot. Repository: rCTE Clang Tools Extra CHANGES SINCE LAST ACTION https://reviews.llvm.org/D55224/new/ https://reviews.llvm.org/D55224 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits