[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

Alex Voicu via cfe-commits Thu, 03 Apr 2025 11:04:19 -0700

AlexVlx wrote:

> > as a must have, it allows featureful AMDGCN flavoured SPIR-V to be 
> > produced, where target specific capability is guarded and chosen or 
> > discarded when finalising compilation for a concrete target.
> 
> I understand the reasoning behind providing such mechanisms to guard hardware 
> specific code for targets, that are being JITed, but I'm not sure about, how 
> would it work in SPIR-V case. Like it's is described now you are able to 
> easily remove the instructions within the guarded block. But SPIR-V also 
> provides `OpExtension` and `OpCapability` instructions that specify, which 
> features are used in the module and are placed on the top of it, so the 
> runtime is free to discard any module with unknown/unsupported capability. 
> Will you also provide a tool to cut those capabilities along the instructions?
>


Thank you for the feedback! I might not be getting the question right (case in 
which I apologise in advance!), but I think that for "vanilla" SPIR-V i.e. not 
vendor flavoured one, where one strictly has to deal with Extensions / non-core 
capabilities, we probably would have the following situation:

- The `processor_is` query is not particularly useful, unless we'd start 
defining pseudo-processors i.e. collections of features, which IMHO would be 
bad / subvert one of the main benefits of SPIR-V;
- The `is_invocable` query, however, is probably quite useful even there, 
assuming we'd start somewhat more aggressively introducing target specific 
intrinsics which map to e.g. `Op*`s and do the reasonable thing of bubbling 
them into clang via builtins;
- please note that an underlying assumption for our use here is that SPIR-V 
gets reverse translated into LLVM IR, I assume you are interested in the case 
where that assumption does not hold / the transforms would only apply to the 
SPIR-V module - with the caveat that this is somewhat handwavium powered, I 
will suggest that the benefit in that case is that if a module containing 
guarded functionality is encountered by a RT that does not support said 
functionality, instead of fully discarding the module it becomes possible to 
just remove the offending bits, in a way similar to what is done here, and 
still successfully load it.

> My review is not gating here, but just for my curiosity (and to have an 
> opinion about: _"In the end, I will note there is nothing that is actually 
> AMDGPU specific here, so it is possible that in the future, assuming 
> interests from other targets / users, we'd just promote them to generic 
> intrinsics."_ ):
> 
> 1. Is it allowed to use the builtin along a runtime-known boolean flag? 
> (probably it's also somewhat related to Erich's comments)

No it is not, these have to be constant foldable (part of the motivation for 
the awkward design that is giving people pause is to unambiguously constrain 
them to cases where they will be constant foldable, strictly based on their 
value, without trying to explain that wrinkle to users). Adding in arbitrary 
run time values / performing arbitrary ops on these means that we might 
inadvertently allow through invalid code. If the question is more about 
something along the lines of "at run time a value is passed as an input to the 
finalisation process i.e. it is constant" in theory this could be made to work, 
but the interface isn't quite there and it remains dangerous. For completeness, 
I will note that in some sense a run time value is being passed for the SPIR-V 
case, as the target is only known at run time, although this is not user 
controlled.

> 2. Does this builtin act like a barrier for optimization passes? if so, is a 
> load from llvm.amdgcn.is GV considered to be a barrier, or something else?

In theory, only by virtue of it being a load from an externally initialised 
global variable, with all the implications of that. However, the intention is 
to run the expansion pass as early as possible, immediately after Clang 
CodeGen, so that subsequent passes don't have to deal with these being present 
at all (they are meant to be rather ephemeral, and would only ever come into 
being when targeting `amdgcnspirv`). 

> 3. Is it envisioned, that optnone (LLVM/SPIR-V) has no effect on the 
> pass/tool, that removes target-dependent code?

Yes, this is always enabled for AMDGPU and has to work with -O0 / `optnone` 
which is another reason for going for tight constraints. For the currently 
allowed uses, the transform is fairly straightforward and is not too disruptive 
to the module. Furthermore, it does not have to rely on e.g. DCE, inlining or 
constprop being run, which at O0 wouldn't happen. Conversely, were these to be 
allowed e.g. as arguments to a function call, then we'd have to start looking 
through / inlining calls where predicates are passed. Which is doable, and we 
have implemented, but is intrusive, non-trivial and possibly quite expensive 
(essentially we'd need to clone the callee to avoid messing up user provided 
code, simplify the callee to prevent spurious failures around e.g. copying 
arguments into local allocas, then inline the simplified clone, then tidy up 
etc.).



https://github.com/llvm/llvm-project/pull/134016
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

Reply via email to