rjmccall added a comment.

For that example, yes, approach #3 would result in that exact same IR on 
targets that lack direct hardware support for `_Float16` operations.  But 
getting that behavior right in general requires a different implementation than 
is provided by this patch, which is implementing approach #4 and 
inappropriately changing the formal types of expressions.

In contrast, approach #1 would produce IR like this:

  define dso_local arm_aapcscc half @foo(half %a, half %b, half %c) #0 {
  entry:
    %a.addr = alloca half, align 2
    %b.addr = alloca half, align 2
    %c.addr = alloca half, align 2
    store half %a, half* %a.addr, align 2
    store half %b, half* %b.addr, align 2
    store half %c, half* %c.addr, align 2
    %0 = load half, half* %a.addr, align 2
    %conv = fpext half %0 to float
    %1 = load half, half* %b.addr, align 2
    %conv1 = fpext half %1 to float
    %add = fadd float %conv, %conv1
    %trunc = fptrunc float %add to half
    %ext = fpext half %trunc to float
    %2 = load half, half* %c.addr, align 2
    %conv2 = fpext half %2 to float
    %add3 = fadd float %ext, %conv2
    %3 = fptrunc float %add3 to half
    ret half %3
  }

I was under the impression that `-fexcess-precision` had some sort of strict 
mode that forces this pattern, but apparently not, and the choices are just 
between `standard` (truncation is only forced at casts and assignments) and 
`fast` (optimizer has free rein to remove truncations).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113107/new/

https://reviews.llvm.org/D113107

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to