jansvoboda11 added inline comments.
================ Comment at: clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:359-360 + findSharedEntryByUID(llvm::vfs::Status Stat) const { + return SharedCache.getShard(Stat.getName()) + .findEntryByUID(Stat.getUniqueID()); + } ---------------- dexonsmith wrote: > This doesn't look right to me. UIDs should be sharded independently of the > filename they happen to have been reached by; otherwise each filename shard > is developing its own idea of what each UID means. Since UID distribution is > not uniform, probably the UID shard should be chosen by > `hash_value(Stat.getUniqueID()) % NumShards`. > > You could use the same sets of shards for UIDMap and FilenameMap, but since > they're independent I'd probably do: > - UIDCache: sharded by UID: UIDMap and BumpPtrAllocator for entries (and > likely anything else tied to content) > - FilenameCache: sharded by filename: FilenameMap (and perhaps other things > tied to filename?) Hmm, skimming through `RealFileSystem::status`, I saw that it's calling `sys::fs::status` with "follow symlinks" enabled. It made sense to me that the name stored in `llvm::vfs::Status` would match that and refer the fully resolved target entry, not the symlink itself. Seeing as this is not the case, I agree the UID itself should be used for choosing the shard. ================ Comment at: clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:23-24 + auto MaybeFile = getUnderlyingFS().openFileForRead(Filename); if (!MaybeFile) return MaybeFile.getError(); auto File = std::move(*MaybeFile); ---------------- dexonsmith wrote: > In what circumstances should this return a cached-error TentativeEntry? Any? I don't think the distinction matters at this level. Whether failures should be cached is a decision that's being made one level up. I personally prefer having `TentativeEntry` to be "non-fallible" and explicitly wrapping the whole thing in `ErrorOr`. That makes it easier for others to know what they are working with (i.e. this object cannot represent an error state). Eventually, I think this would make sense for the caches and `CachedFileSystemEntry` too. ================ Comment at: clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:28-29 auto MaybeStat = File->status(); if (!MaybeStat) return MaybeStat.getError(); auto Stat = std::move(*MaybeStat); ---------------- dexonsmith wrote: > Since the file was opened, should we return cached-error TentativeEntry here, > rather than an error? Let's return an error here and create error `CachedFileSystemEntry` one level up. ================ Comment at: clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:275 + const auto &Entry1 = getOrEmplaceSharedEntryForUID(std::move(TEntry)); + const auto &Entry2 = getOrInsertSharedEntryForFilename(Filename, Entry1); + return insertLocalEntryForFilename(Filename, Entry2); ---------------- dexonsmith wrote: > jansvoboda11 wrote: > > I'm not sure these should be separate. We could end up in situation where > > the Filename map contains different entry than the UID map for the same > > directory entry. I'm tempted to merge these functions into one and perform > > the updates in a single critical section... > > I'm not sure these should be separate. We could end up in situation where > > the Filename map contains different entry than the UID map for the same > > directory entry. > > I'm also sure precisely what you mean by "for the same directory entry" in > this context; and I don't see what's wrong with the situation I think you're > outlining. > > > I'm tempted to merge these functions into one and perform the updates in a > > single critical section... > > A single critical section for setting UID and filename at the same time would > be hard to get right (and efficient), since UIDs have aliases through other > filenames due to different directory paths (dir/../x.h vs x.h) and filesystem > links (hard and symbolic). > > Here's the race that I think(?) you're worried about: > > - Worker1 does a tentative stat of "x.h", finds a UID that isn't mapped > (UIDX1, but it's ignored...). > - Worker2 does a tentative stat of "x.h", finds a UID that isn't mapped > (UIDX1, but it's ignored...). > - Worker1 opens "x.h", finds ContentX1+StatX1 (with UIDX1), saves mapping > UIDX1 -> ContentX1+StatX1. > - "x.h" changes. > - Worker2 opens "x.h", finds ContentX2+StatX2 (with UIDX2), saves mapping > UIDX2 -> ContentX2+StatX2. > - Worker2 saves mapping "x.h" -> ContentX2+StatX2. > - Both workers move forward with "x.h" -> ContentX2+StatX2. > > IIUC, you're concerned that the mapping UIDX1 -> ContentX1+StatX1 was saved. > The side effect is that if a future tentative stat of (e.g.) "y.h" returns > UIDX1, then "y.h" will be mapped to ContentX1+StatX1. Is this what concerns > you? Why? (Is it something else?) > > The concern I have is that some filesystems recycle UIDs (maybe "x.h" *was* a > symbolic link to "y.h" and then became its own file... or maybe "x.h" and > "y.h" were hard links... or maybe "y.h" is just a new file!). But that's a > problem with using UIDs to detect equivalent filesystem links / content in > general. I don't see any reason to be more concerned here than elsewhere, and > to avoid depending on UID we'd need a pretty different design (e.g., lazily > detect and model directory structure and symbolic links). Yes, that's the kind of scenario I was thinking about. I'm not concerned about consequences of that side effect, I just don't like storing garbage that will most likely never be used/referenced again and might be confusing during debugging. I agree with you on UID recycling... Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D114966/new/ https://reviews.llvm.org/D114966 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits