PiotrFusik marked 2 inline comments as done. PiotrFusik added inline comments.
================ Comment at: clang/lib/Headers/opencl-c.h:15594 +int __ovld sub_group_elect(void); +int __ovld sub_group_non_uniform_all( int predicate ); +int __ovld sub_group_non_uniform_any( int predicate ); ---------------- Anastasia wrote: > PiotrFusik wrote: > > Anastasia wrote: > > > PiotrFusik wrote: > > > > Anastasia wrote: > > > > > PiotrFusik wrote: > > > > > > Anastasia wrote: > > > > > > > PiotrFusik wrote: > > > > > > > > Anastasia wrote: > > > > > > > > > Ideally `convergent` attribute was intended to be used in the > > > > > > > > > non-divergent scenarios. So I don't know if it's going to do > > > > > > > > > what is needed here. Did you look into this already? > > > > > > > > > > > > > > > > > > If we look at the Clang documentation it says: > > > > > > > > > > > > > > > > > > > In languages designed for SPMD/SIMT programming model, e.g. > > > > > > > > > > OpenCL or CUDA, the call instructions of a function with > > > > > > > > > > this attribute must be executed by all work items or > > > > > > > > > > threads in a work group or sub group. > > > > > > > > > > > > > > > > > > > > > > > > > > > I remember @nhaehnle was looking at using `convergent` with > > > > > > > > > operations in the divergent control flow some time ago > > > > > > > > > https://reviews.llvm.org/D68994? I am not sure where this > > > > > > > > > thread ended up and whether we can expect this to work > > > > > > > > > currently? > > > > > > > > Thanks for pointing this out! My understanding of the > > > > > > > > `convergent` attribute was that it's for uniform control flow, > > > > > > > > as per the documentation you cited. > > > > > > > > A quick check shows that Intel Graphics Compiler doesn't suffer > > > > > > > > from this invalid optimization. > > > > > > > > Yet I agree that the functions should be marked somehow. It is > > > > > > > > with `__conv` ? > > > > > > > > > > > > > > > A quick check shows that Intel Graphics Compiler doesn't suffer > > > > > > > > from this invalid optimization. > > > > > > > > > > > > > > Did you check on the examples that @nhaehnle provided in the > > > > > > > review? > > > > > > > > > > > > > > > Yet I agree that the functions should be marked somehow. It is > > > > > > > > with __conv ? > > > > > > > > > > > > > > I am not sure there is a solution to this in the upstream LLVM at > > > > > > > present. I am hoping @nhaehnle can provide us more information. > > > > > > > Alternatively, we could run his examples against the latest LLVM > > > > > > > revision to see if the problem still remains or not. I am not > > > > > > > saying that it should block this review, I am ok that we commit > > > > > > > your patch but if we know already that this functionality can't > > > > > > > be fully supported it would be good to at least create a bug to > > > > > > > OpenCL component to record this. > > > > > > The following example: > > > > > > ``` > > > > > > kernel void test(global int *data) > > > > > > { > > > > > > uint id = (uint) get_global_id(0); > > > > > > if (id < 4) > > > > > > data[id] = sub_group_elect(); > > > > > > else > > > > > > data[id] = sub_group_elect(); > > > > > > } > > > > > > ``` > > > > > > with `clang -S -emit-llvm` emits invalid code both with and without > > > > > > the `convergent` attribute. > > > > > > > > > > > > We don't have this problem in our Intel Graphics Compiler though. > > > > > > That's because we replace the subgroup functions with internal > > > > > > intrinsics early enough. These intrinsics are marked with both the > > > > > > `convergent` and `inaccessiblememonly` attributes. Then in the > > > > > > `SimplifyCFG` pass we prevent optimization if both of these > > > > > > attributes are present: > > > > > > https://github.com/intel/intel-graphics-compiler/blob/master/IGC/Compiler/GenTTI.cpp#L397 > > > > > > > > > > > > I don't understand what can go wrong in the second example (with > > > > > > the jump threading pass) ? > > > > > > > > > > > Thanks for checking. Do you mind creating a bug to clang for now > > > > > under OpenCL component > > > > > https://bugs.llvm.org/enter_bug.cgi?product=clang although we might > > > > > reclassify this for a wider scope later on. > > > > > > > > > > > > > > > > > > > > > Then in the SimplifyCFG pass we prevent optimization if both of > > > > > > these attributes are present: > > > > > > https://github.com/intel/intel-graphics-compiler/blob/master/IGC/Compiler/GenTTI.cpp#L397 > > > > > > > > > > I see. Perhaps we need to come up with a new semantic of `convergent` > > > > > and update LLVM passes... or maybe we have to introduce a new > > > > > attribute. Not sure. I suggest we add a comment to > > > > > https://reviews.llvm.org/D68994 and see if it gets picked up. > > > > > > > > > > > > > > Do you mean a bug for the invalid optimizations of these new subgroup > > > > functions? > > > > I added a comment to https://reviews.llvm.org/D68994. > > > > Do you mean a bug for the invalid optimizations of these new subgroup > > > > functions? > > > > > > yes, you can just copy/paste your test above: > > > > > > > > > > The following example: > > > > > > > > kernel void test(global int *data) > > > > { > > > > uint id = (uint) get_global_id(0); > > > > if (id < 4) > > > > data[id] = sub_group_elect(); > > > > else > > > > data[id] = sub_group_elect(); > > > > } > > > > > > > > with clang -S -emit-llvm emits invalid code both with and without the > > > > convergent attribute. > > > > > > If you can it would be good to highlight what is invalid in IR produced > > > by upstream LLVM now. > > > > > > > > Shall I submit the bug before this change is merged? > The order is not important as long as we have it eventually. :) Submitted https://bugs.llvm.org/show_bug.cgi?id=46199 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79781/new/ https://reviews.llvm.org/D79781 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits