Hi Ettore,

Thanks for the example. Up to now we used noduplicate to prevent this erroneous 
optimisation but using convergent instead would be equally good. And as it's 
pointed out it is less restrictive to allow more optimisations in LLVM i.e. 
loop unrolling with convergent operation in it.

I think we have a good motivation now for adding the convergent attribute 
explicitly. I am just trying to think whether we need to keep noduplicate at 
all, but I am guessing this is something we can decide later as well.

Cheers,
Anastasia

-----Original Message-----
From: Ettore Speziale [mailto:speziale.ett...@gmail.com] 
Sent: 25 October 2016 23:12
To: Anastasia Stulova
Cc: Ettore Speziale; Liu, Yaxun (Sam); 
reviews+d25343+public+a10e9553b0fc8...@reviews.llvm.org; 
alexey.ba...@intel.com; aaron.ball...@gmail.com; Clang Commits; Sumner, Brian; 
Stellard, Thomas; Arsenault, Matthew; nd
Subject: Re: [PATCH] D25343: [OpenCL] Mark group functions as convergent in 
opencl-c.h

Hello,

> As far as I understand the whole problem is that the optimized functions are 
> marked by __attribute__((pure)). If the attribute is removed from your 
> example, we get LLVM dump preserving correctness:
> 
> define i32 @bar(i32 %x) local_unnamed_addr #0 {
> entry:
>  %call = tail call i32 @foo() #2
>  %tobool = icmp eq i32 %x, 0
>  %.call = select i1 %tobool, i32 0, i32 %call  ret i32 %.call }

I’ve used __attribute__((pure)) only to force LLVM applying the transformation 
and show you an example of incorrect behavior.

This is another example:

void foo();
int baz();

int bar(int x) {
  int y;
  if (x) 
    y = baz();
  foo();
  if (x) 
    y = baz();
  return y;
} 

Which gets lowered into:

define i32 @bar(i32) #0 {
  %2 = icmp eq i32 %0, 0
  br i1 %2, label %3, label %4

; <label>:3                                       ; preds = %1
  tail call void (...) @foo() #2
  br label %7

; <label>:4                                       ; preds = %1
  %5 = tail call i32 (...) @baz() #2
  tail call void (...) @foo() #2
  %6 = tail call i32 (...) @baz() #2
  br label %7

; <label>:7                                       ; preds = %3, %4
  %8 = phi i32 [ %6, %4 ], [ undef, %3 ]
  ret i32 %8
}

As you can see the call sites of foo in the optimized IR are not 
control-equivalent to the only call site of foo in the unoptimized IR. Now 
imaging foo is implemented in another module and contains a call to a 
convergent function — e.f. barrier(). You are going to generate incorrect code.

Bye

--------------------------------------------------
Ettore Speziale — Compiler Engineer
speziale.ett...@gmail.com
espezi...@apple.com
--------------------------------------------------

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to