On Fri, May 6, 2016 at 1:56 PM, Ettore Speziale via cfe-commits < cfe-commits@lists.llvm.org> wrote:
> Hello, > > > In the case of foo, there could be a problem. > > If you do not mark it convergent, the LLVM sink pass push the call to > foo to the then branch of the ternary operator, hence the program has been > incorrectly optimized. > > > > Really? It looks like the problem is that you lied to the compiler by > marking the function as 'pure'. The barrier is a side-effect that cannot be > removed or duplicated, so it's not correct to mark this function as pure. > > I was trying to write a very small example to trick LLVM and trigger the > optimization. It is based on Transforms/Sink/convergent.ll: > > define i32 @foo(i1 %arg) { > entry: > %c = call i32 @bar() readonly convergent > br i1 %arg, label %then, label %end > > then: > ret i32 %c > > end: > ret i32 0 > } > > declare i32 @bar() readonly convergent > This example looks wrong to me. It doesn't seem meaningful for a function to be both readonly and convergent, because convergent means the call has some side-effect visible to other threads and readonly means the call has no side-effects visible outside the function. Here is another example: > > void foo0(void); > void foo1(void); > > __attribute__((convergent)) void baz() { > barrier(CLK_GLOBAL_MEM_FENCE); > } > > void bar(int x, global int *y) { > if (x < 5) > foo0(); > else > foo1(); > > baz(); > > if (x < 5) > foo0(); > else > foo1(); > } > This one looks a lot more interesting. It looks like 'convergent' is a way of informing LLVM that the call cannot be duplicated, yes? That being the case, how is this attribute different from the existing [[clang::noduplicate]] / __attribute__((noduplicate)) attribute? Based on Transforms/JumpThreading/basic.ll: > > define void @h_con(i32 %p) { > %x = icmp ult i32 %p, 5 > br i1 %x, label %l1, label %l2 > > l1: > call void @j() > br label %l3 > > l2: > call void @k() > br label %l3 > > l3: > ; CHECK: call void @g() [[CON:#[0-9]+]] > ; CHECK-NOT: call void @g() [[CON]] > call void @g() convergent > %y = icmp ult i32 %p, 5 > br i1 %y, label %l4, label %l5 > > l4: > call void @j() > ret void > > l5: > call void @k() > ret void > ; CHECK: } > } > > If you do not mark baz convergent, you get this: > > clang -x cl -emit-llvm -S -o - test.c -O0 | opt -mem2reg -jump-threading -S > > define void @bar(i32 %x) #0 { > entry: > %cmp = icmp slt i32 %x, 5 > br i1 %cmp, label %if.then2, label %if.else3 > > if.then2: ; preds = %entry > call void @foo0() > call void @baz() > call void @foo0() > br label %if.end4 > > if.else3: ; preds = %entry > call void @foo1() > call void @baz() > call void @foo1() > br label %if.end4 > > if.end4: ; preds = %if.else3, > %if.then2 > ret void > } > > Which is illegal, as the value of x might not be the same for all > work-items. > > I’ll update the patch such as: > > * it uses the example about jump-threading > * it marks the attribute available in OpenCL/Cuda > * it provides the [[clang::convergent]] attribute > > Thanks, > Ettore Speziale > > -------------------------------------------------- > Ettore Speziale — Compiler Engineer > speziale.ett...@gmail.com > espezi...@apple.com > -------------------------------------------------- > > _______________________________________________ > cfe-commits mailing list > cfe-commits@lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits >
_______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits