aganea added inline comments.
================ Comment at: clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:84 + DependencyScanningFilesystemSharedCache() { + NumShards = 8; + CacheShards = llvm::make_unique<CacheShard[]>(NumShards); ---------------- arphaman wrote: > aganea wrote: > > This line needs a comment. Is this value based on empirical results across > > different hardwares? (I can't recall your answer at the LLVM conf) I am > > wondering what would be the good strategy here? The more cores/HT in your > > PC, the more chances that you'll hit the same shard, thus locking. But the > > bigger we make this value, the more `StringMaps` we will have, and more > > cache misses possibly. > > Should it be something like `llvm::hardware_concurrency() / some_fudge`? > > It'd be interesting to subsequently profile on high core count machines (or > > maybe you have done that). > I rewrote it to use a heuristic we settled on after doing empirical testing > on an 18 core / 36 thread machine ( max(2, thread_count / 4) ), and added a > comment. This was the number `9` (36 / 4) I mentioned at the talk, so we only > got to it because of a heuristic. I think now after some SDK/OS updates the > effect from adding more shards is less pronounced, so it mostly flatlines > past some number between 5-10. A better heuristic would probably be OS > specific to take the cost of lock contention into account. > > Note that the increased number of shards does not increase the number of > cache misses, because the shard # is determined by the filename (we don't > actually have global cache misses, as the cache is designed to have only one > miss per file when it's first accessed)! It's just that an increased number > of shards doesn't improve performance after hitting some specific limit, so > we want to find a point where we can taper it off. > > It would still be definitely interesting to profile it on other high core > machines with different OSes to see if it's a reasonable heuristic for those > scenarios too. I'll give it a try on Windows 10, one project here has a large codebase and some wild machines to test on. ================ Comment at: clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:156 +/// this subclass. +class MinimizedVFSFile final : public llvm::vfs::File { +public: ---------------- This makes only a short-lived objects, is that right? Just during the call to `CachedFileSystemEntry::createFileEntry`? ================ Comment at: clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp:94 + // Add any filenames that were explicity passed in the build settings and + // that might be might be opened, as we want to ensure we don't run source + // minimization on them. ---------------- "might be" twice. ================ Comment at: clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp:148 + RealFS = new ProxyFileSystemWithoutChdir(llvm::vfs::getRealFileSystem()); + if (Service.getMode() == ScanningMode::MinimizedSourcePreprocessing) + DepFS = new DependencyScanningFilesystem(Service.getSharedCache(), RealFS); ---------------- arphaman wrote: > aganea wrote: > > Can we not use caching all the time? > We want to have a mode where it's as close to the regular `clang -E` > invocation as possible for correctness CI and testing. We also haven't > evaluated if the cost of keeping non-minimized sources in memory, so it might > be too expensive for practical use? I can add a third option that caches but > doesn't minimize though as a follow-up patch though > Yes that would be nice. As for the size taken in RAM, it would be only .H files, right? For Clang+LLVM+LLD I'm counting about 40 MB. But indeed with a large project, that would be about 1.5 GB of .H. Although I doubt all these files will be loaded at once in memory (I'll check) As for the usage: Fastbuild works like distcc (plain mode, not pump) so we were also planning on extracting the fully preprocessed output, not only the dependencies. There's one use-case where we need to preprocess locally, then send the preprocessed output remotely for compilation. And another use-case where we only want to extract the dependency list, compute a digest, to retrieve the OBJ from a network cache. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D63907/new/ https://reviews.llvm.org/D63907 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits