aganea added inline comments.

================
Comment at: 
clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:84
+    DependencyScanningFilesystemSharedCache() {
+  NumShards = 8;
+  CacheShards = llvm::make_unique<CacheShard[]>(NumShards);
----------------
arphaman wrote:
> aganea wrote:
> > This line needs a comment. Is this value based on empirical results across 
> > different hardwares? (I can't recall your answer at the LLVM conf) I am 
> > wondering what would be the good strategy here? The more cores/HT in your 
> > PC, the more chances that you'll hit the same shard, thus locking. But the 
> > bigger we make this value, the more `StringMaps` we will have, and more 
> > cache misses possibly.
> > Should it be something like `llvm::hardware_concurrency() / some_fudge`? 
> > It'd be interesting to subsequently profile on high core count machines (or 
> > maybe you have done that).
> I rewrote it to use a heuristic we settled on after doing empirical testing 
> on an 18 core / 36 thread machine ( max(2, thread_count / 4) ), and added a 
> comment. This was the number `9` (36 / 4) I mentioned at the talk, so we only 
> got to it because of a heuristic. I think now after some SDK/OS updates the 
> effect from adding more shards is less pronounced, so it mostly flatlines 
> past some number between 5-10. A better heuristic would probably be OS 
> specific to take the cost of lock contention into account.
> 
> Note that the increased number of shards does not increase the number of 
> cache misses, because the shard # is determined by the filename (we don't 
> actually have global cache misses, as the cache is designed to have only one 
> miss per file when it's first accessed)! It's just that an increased number 
> of shards doesn't improve performance after hitting some specific limit, so 
> we want to find a point where we can taper it off.
> 
> It would still be definitely interesting to profile it on other high core 
> machines with different OSes to see if it's a reasonable heuristic for those 
> scenarios too.
I'll give it a try on Windows 10, one project here has a large codebase and 
some wild machines to test on.


================
Comment at: 
clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:156
+/// this subclass.
+class MinimizedVFSFile final : public llvm::vfs::File {
+public:
----------------
This makes only a short-lived objects, is that right? Just during the call to 
`CachedFileSystemEntry::createFileEntry`?


================
Comment at: clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp:94
+      // Add any filenames that were explicity passed in the build settings and
+      // that might be might be opened, as we want to ensure we don't run 
source
+      // minimization on them.
----------------
"might be" twice.


================
Comment at: 
clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp:148
+  RealFS = new ProxyFileSystemWithoutChdir(llvm::vfs::getRealFileSystem());
+  if (Service.getMode() == ScanningMode::MinimizedSourcePreprocessing)
+    DepFS = new DependencyScanningFilesystem(Service.getSharedCache(), RealFS);
----------------
arphaman wrote:
> aganea wrote:
> > Can we not use caching all the time?
> We want to have a mode where it's as close to the regular `clang -E` 
> invocation as possible for correctness CI and testing. We also haven't 
> evaluated if the cost of keeping non-minimized sources in memory, so it might 
> be too expensive for practical use? I can add a third option that caches but 
> doesn't minimize though as a follow-up patch though 
> 
Yes that would be nice. As for the size taken in RAM, it would be only .H 
files, right? For Clang+LLVM+LLD I'm counting about 40 MB. But indeed with a 
large project, that would be about 1.5 GB of .H. Although I doubt all these 
files will be loaded at once in memory (I'll check)

As for the usage: Fastbuild works like distcc (plain mode, not pump) so we were 
also planning on extracting the fully preprocessed output, not only the 
dependencies. There's one use-case where we need to preprocess locally, then 
send the preprocessed output remotely for compilation. And another use-case 
where we only want to extract the dependency list, compute a digest, to 
retrieve the OBJ from a network cache.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63907/new/

https://reviews.llvm.org/D63907



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to