jansvoboda11 marked 3 inline comments as done.
jansvoboda11 added inline comments.


================
Comment at: 
clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:50-51
+
+  /// Minimize contents of the file.
+  static void minimize(CachedFileSystemEntry &Entry);
 
----------------
dexonsmith wrote:
> Might be more natural as a non-static:
> ```
> lang=c++
> void minimize();
> ```
> 
Agreed. I originally wanted to minimize diff of the implementation (by being 
able to refer to `Result` as before), but I'm happy making this an instance 
method.


================
Comment at: 
clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:69-75
+  llvm::ErrorOr<StringRef> getContents(bool Minimized) const {
+    assert(isValid() && "not initialized");
     if (!MaybeStat)
       return MaybeStat.getError();
     assert(!MaybeStat->isDirectory() && "not a file");
-    assert(isValid() && "not initialized");
-    return Contents->getBuffer();
+    return (Minimized ? MinimizedContents : OriginalContents)->getBuffer();
   }
----------------
dexonsmith wrote:
> I *think* this is thread-safe -- since `Minimized` should be the same as when 
> the local cache entry was created -- but it's a bit subtle.
> 
> The problematic case I am worried about:
> - First use in local cache is non-minimized.
>     - Creates shared cache entry that's not minimized.
> - Some other local cache wants it to be minimized.
> - Later use in local cache is minimized.
>     - Accessing `Minimized` pointer races with the other thread setting it.
> 
> If the local cache is guaranteed to always pass the same value for 
> `Minimized` as when it fist accessed it, then there shouldn't be a problem...
> 
> I wonder if there's a way to make it less subtle?
I extracted the logic into `EntryRef` which now gets returned from the main 
function. It carries the `bool Minimized` bit and makes reasoning about this 
more local. WDYT?


================
Comment at: 
clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:106-108
+  std::unique_ptr<llvm::MemoryBuffer> OriginalContents;
+  std::unique_ptr<llvm::MemoryBuffer> MinimizedContents;
   PreprocessorSkippedRangeMapping PPSkippedRangeMapping;
----------------
dexonsmith wrote:
> I'm finding it a bit subtle detecting if there are races on access / setting 
> of these, but I think it's correct.
> - I think I verified that they are "set once".
> - All the setters are guarded by locks.
> - The first getter per "local" cache is guarded by a lock.
> - Subsequent getters are not.
> 
> The key question: are the subsequent getters ONLY used when the first getter 
> was successful?
> 
> One way to make it more obvious:
> ```
> lang=c++
>   struct ContentWithPPRanges {
>     std::unique_ptr<llvm::MemoryBuffer> Content;
>     PreprocessorSkippedRangeMapping PPSkippedRangeMapping;
>   };
> 
> private:
>   // Always accessed,mutated under a lock. Not mutated after they escape.
>   std::unique_ptr<llvm::MemoryBuffer> Original;
>   std::unique_ptr<llvm::MemoryBuffer> Minimized;
>   PreprocessorSkippedRangeMapping PPSkippedRangeMapping;
> 
>   // Used in getters. Pointed-at memory immutable after these are set.
>   std::atomic<const llvm::MemoryBuffer *> OriginalAccess;
>   std::atomic<const llvm::MemoryBuffer *> MinimizedAccess;
>   std::atomic<const PreprocessorSkippedRangeMapping *> PPRangesAccess;
> ```
> I don't think this is the only approach though.
> 
I think there are no races on the original contents. The pointer is 
unconditionally set on creation of `CachedFileSystemEntry` under a lock that no 
threads get past without having set the pointer (or failing and never accessing 
the pointer).

For minimized contents, the latest revision adds check at the beginning of the 
main function (`needsMinimization`) outside the critical section. There are 
three paths I can think of:
* The check returns `true` in thread A (pointer is `null`), thread A enters 
critical section, minimizes the contents and initializes the pointer.
* The check returns `true` in thread A, but thread B entered the critical 
section, minimized contents and initialized the pointer. When thread A enters 
the critical section, it performs the check again, figures that out and skips 
minimization.
* The check returns `false` and the local cache entry is returned.

So there definitely is a race here, but I believe it's benign. Do you agree? Do 
you think it's worth addressing?


================
Comment at: 
clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:253
+  if (Entry.isMinimized()) {
+    if (!Entry.getPPSkippedRangeMapping().empty() && PPSkipMappings)
+      (*PPSkipMappings)[Result->Buffer->getBufferStart()] =
----------------
Maybe the `isMinimized()` check should be performed internally in `EntryRef` 
when accessing `getPPSkippedRangeMapping()` to make the race avoidance more 
local...


================
Comment at: 
clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:189-191
+    if (ShouldBeMinimized && CacheEntry.hasOriginalContents() &&
+        !CacheEntry.hasMinimizedContents())
+      CachedFileSystemEntry::minimize(CacheEntry);
----------------
dexonsmith wrote:
> Is the DependencyScanningWorkerFilesystem guaranteed to always want either 
> minimized or not minimized?
> 
> IOW, if the same filesystem is reused, and the first time only the original 
> file was needed and later the minimized was needed, I don't see the code path 
> for minimizing the file later. But maybe reusing one of these FSs is not 
> supported?
You're right, minimization of a particular file can be turned on/off on the 
fly. I already have a test for this in D114966 but somehow dropped it when 
extracting this patch. Fixed in the latest revision.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115346/new/

https://reviews.llvm.org/D115346

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to