Hello, > In the case of foo, there could be a problem. > If you do not mark it convergent, the LLVM sink pass push the call to foo to > the then branch of the ternary operator, hence the program has been > incorrectly optimized. > > Really? It looks like the problem is that you lied to the compiler by marking > the function as 'pure'. The barrier is a side-effect that cannot be removed > or duplicated, so it's not correct to mark this function as pure.
I was trying to write a very small example to trick LLVM and trigger the optimization. It is based on Transforms/Sink/convergent.ll: define i32 @foo(i1 %arg) { entry: %c = call i32 @bar() readonly convergent br i1 %arg, label %then, label %end then: ret i32 %c end: ret i32 0 } declare i32 @bar() readonly convergent Here is another example: void foo0(void); void foo1(void); __attribute__((convergent)) void baz() { barrier(CLK_GLOBAL_MEM_FENCE); } void bar(int x, global int *y) { if (x < 5) foo0(); else foo1(); baz(); if (x < 5) foo0(); else foo1(); } Based on Transforms/JumpThreading/basic.ll: define void @h_con(i32 %p) { %x = icmp ult i32 %p, 5 br i1 %x, label %l1, label %l2 l1: call void @j() br label %l3 l2: call void @k() br label %l3 l3: ; CHECK: call void @g() [[CON:#[0-9]+]] ; CHECK-NOT: call void @g() [[CON]] call void @g() convergent %y = icmp ult i32 %p, 5 br i1 %y, label %l4, label %l5 l4: call void @j() ret void l5: call void @k() ret void ; CHECK: } } If you do not mark baz convergent, you get this: clang -x cl -emit-llvm -S -o - test.c -O0 | opt -mem2reg -jump-threading -S define void @bar(i32 %x) #0 { entry: %cmp = icmp slt i32 %x, 5 br i1 %cmp, label %if.then2, label %if.else3 if.then2: ; preds = %entry call void @foo0() call void @baz() call void @foo0() br label %if.end4 if.else3: ; preds = %entry call void @foo1() call void @baz() call void @foo1() br label %if.end4 if.end4: ; preds = %if.else3, %if.then2 ret void } Which is illegal, as the value of x might not be the same for all work-items. I’ll update the patch such as: * it uses the example about jump-threading * it marks the attribute available in OpenCL/Cuda * it provides the [[clang::convergent]] attribute Thanks, Ettore Speziale -------------------------------------------------- Ettore Speziale — Compiler Engineer speziale.ett...@gmail.com espezi...@apple.com -------------------------------------------------- _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits