[PATCH] D111720: [clang][deps] Ensure reported context hash is strict

Duncan P. N. Exon Smith via Phabricator via cfe-commits Thu, 21 Oct 2021 16:25:33 -0700

dexonsmith added inline comments.


================
Comment at: clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp:56-58
+  // Ensure the reported context hash is strict.
+  CI.getHeaderSearchOpts().ModulesStrictContextHash = true;
+
----------------
jansvoboda11 wrote:
> dexonsmith wrote:
> > jansvoboda11 wrote:
> > > jansvoboda11 wrote:
> > > > dexonsmith wrote:
> > > > > IIUC, explicit modules don't really have/need a context hash. Can 
> > > > > related options be stripped out when serializing to `-cc1` when 
> > > > > `ImplicitModules` is false?
> > > > > 
> > > > > Basically, I'm asking if `ModulesStrictContextHash` is a no-op when 
> > > > > `ImplicitModules` is false. If not, can we make it a no-op?
> > > > > (If we can, then maybe rename the field to 
> > > > > `ImplicitModulesStrictContextHash` and audit that no one reads it 
> > > > > when `ImplicitModules` is off...)
> > > > Let me clarify this a bit. You're right that when building explicit 
> > > > modules, we don't care about context hash.
> > > > 
> > > > We do care about using **strict** context hash during the scan though - 
> > > > it's an implementation detail through which we prevent mixing 
> > > > incompatible modules/TUs. (This strict context hash is enabled 
> > > > elsewhere in the dependency scanner.)
> > > > 
> > > > At the end of the scan, we take discovered modules and modify/prune 
> > > > their `CompilerInvocation` (in this function). This can essentially 
> > > > "merge" multiple versions of the same module into one, which is very 
> > > > desirable. But we still want to do it according to the **strict** 
> > > > context hash. We don't want to merge versions with different search 
> > > > paths for example (non-strict context hash). That's what this change 
> > > > ensures.
> > > > 
> > > > Note that we don't **need** to report context hashes to scanner 
> > > > clients. Any other identifier derived from a strict context hash would 
> > > > work.
> > > I think the rename you're suggesting is valid.
> > > 
> > > We //could// strip the `ModulesStrictContextHash` in the scanner: after 
> > > we generate the strict context hash and before we generate the 
> > > command-line. I think that can be done in a NFC follow-up.
> > I think I understand the problem now. I hadn't really put together that the 
> > implicit modules context hash machinery was being used to decide the 
> > artifact location for the explicit module.
> > 
> > I'm concerned this is too subtle and fragile. I'm wondering if the 
> > following more naive solution would work:
> > - Prune/canonicalize the CompilerInvocation (as now).
> > - Write/modify any fields to use placeholders for fields the client has 
> > control over. (Besides `OutputFile`, what else is there?)
> > - Generate the `-cc1` and hash it. That's now the context hash. Return it 
> > to the client.
> > 
> > But maybe that's the whole point of the strict context hash, and if there 
> > are bugs where this would behave differently, we should fix the strict 
> > context hash?
> > 
> > ...
> > 
> > Stepping back, please go ahead and commit this incremental improvement, 
> > after expanding the comment to:
> > 1. More fully explain the context: that the context hash will be generated 
> > from the CompilerInvocation and sent to the client, which then uses it to 
> > decide where to store the artifact. We need to make sure it's strict.
> > 2. Explain why `assert(ModulesStrictContextHash)` would fail, even though 
> > the scan used a strict context hash. (Maybe it'd makes sense to make that 
> > change in a follow-up...?)
> > I'm concerned this is too subtle and fragile. I'm wondering if the 
> > following more naive solution would work:
> > - Prune/canonicalize the CompilerInvocation (as now).
> > - Write/modify any fields to use placeholders for fields the client has 
> > control over. (Besides `OutputFile`, what else is there?)
> 
> In general, we don't know much about the intended filesystem paths. Off the 
> top of my head, we don't know the PCM output file (and values for 
> `-fmodule-file=`), input file (and values for `-fmodule-map-file=`: module 
> map used for building might have different location during the scan), 
> serialized diagnostics file (we can't use the file for a TU that discovered 
> the module first).
> 
> > - Generate the `-cc1` and hash it. That's now the context hash. Return it 
> > to the client.
> >
> > But maybe that's the whole point of the strict context hash, and if there 
> > are bugs where this would behave differently, we should fix the strict 
> > context hash?
> 
> I think that's a valid approach. Just wondering why you see it being less 
> subtle and fragile than strict context hash?
Clients like build systems don't want two different `-cc1`s for the same 
object. This would structurally guarantee that, whereas the strict context hash 
seems best-effort.

Maybe the strict context hash should be generated this way? (Maybe it already 
is?)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111720/new/

https://reviews.llvm.org/D111720

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D111720: [clang][deps] Ensure reported context hash is strict

Reply via email to