On Fri, May 6, 2016 at 2:42 PM, David Majnemer <david.majne...@gmail.com> wrote:
> On Fri, May 6, 2016 at 2:36 PM, Richard Smith via cfe-commits < > cfe-commits@lists.llvm.org> wrote: > >> On Fri, May 6, 2016 at 1:56 PM, Ettore Speziale via cfe-commits < >> cfe-commits@lists.llvm.org> wrote: >> >>> Hello, >>> >>> > In the case of foo, there could be a problem. >>> > If you do not mark it convergent, the LLVM sink pass push the call to >>> foo to the then branch of the ternary operator, hence the program has been >>> incorrectly optimized. >>> > >>> > Really? It looks like the problem is that you lied to the compiler by >>> marking the function as 'pure'. The barrier is a side-effect that cannot be >>> removed or duplicated, so it's not correct to mark this function as pure. >>> >>> I was trying to write a very small example to trick LLVM and trigger the >>> optimization. It is based on Transforms/Sink/convergent.ll: >>> >>> define i32 @foo(i1 %arg) { >>> entry: >>> %c = call i32 @bar() readonly convergent >>> br i1 %arg, label %then, label %end >>> >>> then: >>> ret i32 %c >>> >>> end: >>> ret i32 0 >>> } >>> >>> declare i32 @bar() readonly convergent >>> >> >> This example looks wrong to me. It doesn't seem meaningful for a function >> to be both readonly and convergent, because convergent means the call has >> some side-effect visible to other threads and readonly means the call has >> no side-effects visible outside the function. >> >> Here is another example: >>> >>> void foo0(void); >>> void foo1(void); >>> >>> __attribute__((convergent)) void baz() { >>> barrier(CLK_GLOBAL_MEM_FENCE); >>> } >>> >>> void bar(int x, global int *y) { >>> if (x < 5) >>> foo0(); >>> else >>> foo1(); >>> >>> baz(); >>> >>> if (x < 5) >>> foo0(); >>> else >>> foo1(); >>> } >>> >> >> This one looks a lot more interesting. It looks like 'convergent' is a >> way of informing LLVM that the call cannot be duplicated, yes? That being >> the case, how is this attribute different from the existing >> [[clang::noduplicate]] / __attribute__((noduplicate)) attribute? >> > > I think it has more to do with LLVM's definition of convergent: that you > really do not want control dependencies changing for a callsite. > Hmm, so we can't transform: %a = complex_pure_operation1 %b = complex_pure_operation2 %c = select i1 %x, i32 %a, i32 %b call void @foo(i32 %c) convergent ... into ... br i1 %x, label %aa, label %bb aa: %a = complex_pure_operation1 br label %cont bb: %b = complex_pure_operation2 br label %cont cont: %c = phi i32 [ %a, %aa ], [ %b, %bb ] call void @foo(i32 %c) convergent ? It looks like we added the noduplicate attribute to clang to support OpenCL's barrier function. Did we get the semantics for it wrong for its intended use case? > http://llvm.org/docs/LangRef.html#function-attributes > > >> >> Based on Transforms/JumpThreading/basic.ll: >>> >>> define void @h_con(i32 %p) { >>> %x = icmp ult i32 %p, 5 >>> br i1 %x, label %l1, label %l2 >>> >>> l1: >>> call void @j() >>> br label %l3 >>> >>> l2: >>> call void @k() >>> br label %l3 >>> >>> l3: >>> ; CHECK: call void @g() [[CON:#[0-9]+]] >>> ; CHECK-NOT: call void @g() [[CON]] >>> call void @g() convergent >>> %y = icmp ult i32 %p, 5 >>> br i1 %y, label %l4, label %l5 >>> >>> l4: >>> call void @j() >>> ret void >>> >>> l5: >>> call void @k() >>> ret void >>> ; CHECK: } >>> } >>> >>> If you do not mark baz convergent, you get this: >>> >>> clang -x cl -emit-llvm -S -o - test.c -O0 | opt -mem2reg -jump-threading >>> -S >>> >>> define void @bar(i32 %x) #0 { >>> entry: >>> %cmp = icmp slt i32 %x, 5 >>> br i1 %cmp, label %if.then2, label %if.else3 >>> >>> if.then2: ; preds = %entry >>> call void @foo0() >>> call void @baz() >>> call void @foo0() >>> br label %if.end4 >>> >>> if.else3: ; preds = %entry >>> call void @foo1() >>> call void @baz() >>> call void @foo1() >>> br label %if.end4 >>> >>> if.end4: ; preds = %if.else3, >>> %if.then2 >>> ret void >>> } >>> >>> Which is illegal, as the value of x might not be the same for all >>> work-items. >>> >>> I’ll update the patch such as: >>> >>> * it uses the example about jump-threading >>> * it marks the attribute available in OpenCL/Cuda >>> * it provides the [[clang::convergent]] attribute >>> >>> Thanks, >>> Ettore Speziale >>> >>> -------------------------------------------------- >>> Ettore Speziale — Compiler Engineer >>> speziale.ett...@gmail.com >>> espezi...@apple.com >>> -------------------------------------------------- >>> >>> _______________________________________________ >>> cfe-commits mailing list >>> cfe-commits@lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits >>> >> >> >> _______________________________________________ >> cfe-commits mailing list >> cfe-commits@lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits >> >> >
_______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits