arsenm added inline comments.

================
Comment at: clang/lib/Driver/ToolChains/AMDGPU.cpp:167
+    llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> VersionFile =
+        FS.getBufferForFile(BinPath + "/.hipVersion");
+    if (!VersionFile)
----------------
tra wrote:
> arsenm wrote:
> > yaxunl wrote:
> > > yaxunl wrote:
> > > > yaxunl wrote:
> > > > > tra wrote:
> > > > > > arsenm wrote:
> > > > > > > yaxunl wrote:
> > > > > > > > arsenm wrote:
> > > > > > > > > arsenm wrote:
> > > > > > > > > > yaxunl wrote:
> > > > > > > > > > > arsenm wrote:
> > > > > > > > > > > > I don't think the detection should fail if this is 
> > > > > > > > > > > > missing
> > > > > > > > > > > why? this file is unique to HIP installation. If it is 
> > > > > > > > > > > missing, the installation is broken.
> > > > > > > > > > Because I should be able to use this without a complete hip 
> > > > > > > > > > installation. Without a specified version, it should assume 
> > > > > > > > > > the most modern layout. This will for example break 
> > > > > > > > > > pointing --rocm-path at the device library build directory 
> > > > > > > > > > for library tests
> > > > > > > > > I also don't see what value checking the version really 
> > > > > > > > > provides; it may be informative to print it, but I don't 
> > > > > > > > > think it's useful to derive information from it
> > > > > > > > what is the directory structure of your most modern layout?
> > > > > > > /opt/rocm/amdgcn/bitcode/foo.bc
> > > > > > > 
> > > > > > > The plan is to remove this and rely on symlinks in the resource 
> > > > > > > directory though
> > > > > > > I also don't see what value checking the version really provides; 
> > > > > > > it may be informative to print it, but I don't think it's useful 
> > > > > > > to derive information from it
> > > > > > 
> > > > > > In CUDA it's used to detect which back-end features to enable (they 
> > > > > > depend on what's supported by ptxas supplied by that CUDA version). 
> > > > > > I don't think that would be relevant to AMDGPU as it does not need 
> > > > > > external dependencies to generate GPU code.
> > > > > > 
> > > > > > It may be useful for filesystem layout detection. E.g. where to 
> > > > > > find bitcode files and what names to expect. This part seems to be 
> > > > > > relevant for ROCm, but I'm not sure if the layout is closely tied 
> > > > > > to the version.
> > > > > > 
> > > > > We are required to support previous ROCm releases for certain time 
> > > > > range. To do that we need to detect HIP version and enable/disable 
> > > > > certain HIP features based on HIP version when necessary.
> > > > > 
> > > > > Therefore if we have a new directory structure for ROCm, that 
> > > > > directory structure should contain a HIP version file so that we can 
> > > > > detect HIP version.
> > > > > 
> > > > > 
> > > > Currently clang includes some wrapper headers by default, which does 
> > > > not work with ROCm 3.5 since those device functions are defined in HIP 
> > > > headers of ROCm 3.5. To support ROCm 3.5, we have to disable including 
> > > > those wrapper headers. This is one example why we need detect HIP 
> > > > version.
> > > I think we need to separate HIP runtime detection and device library 
> > > detection and also separate hasValidHIPRuntime and hasValidDeviceLibrary. 
> > > ROCm toolchain only need to make sure hasValidDeviceLibrary but HIP 
> > > toolchain need both hasValidDeviceLibrary and hasValidHIPRuntime.
> > Regardless of whether there's a version file or if does anything, I think 
> > the absence of one implies do the most modern thing
> "The most modern thing" is an ever moving target. If the failure modes keep 
> changing it will create a source of new problems every time something 
> changes. Probably not a big issue in this case, but with (eventually) wide 
> enough user base there will be some.
> 
> I assume that AMD does not have much of legacy user base yet for compilations 
> with clang, so defaulting to a recent version may be sensible (unlike CUDA 
> where clang had to deal with a lot of existing CUDA users using old CUDA 
> versions). That said, I'd recommend pinning it to a **specific** version, so 
> the expectations remain stable. Then we can document the failure/fallback in 
> a way users can deal with -- `if no version file is found, clang assumes 
> version x.y and expects to find files X, Y and Z in directory A, B, C. You 
> can specify the locations manually with --something-something-X=A or specify 
> the version with --hip-version=3.7`. 
> 
> Considering that clang is being used from various derived tools, you may not 
> always have the luxury of having the whole ROCm installation around and need 
> to be able to override the expectations via command line flags.
> 
> If we have a way to override the version, then it does not matter all that 
> much which version we pick as a fallback. If the guess is wrong, we can 
> always correct it with a flag.
Unlike CUDA we're not really chasing some 3rd party thing that periodically 
drops unpredictable changes. We know what the most modern thing is on tip of 
tree


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82930/new/

https://reviews.llvm.org/D82930



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to