[PATCH] D113107: Support of expression granularity for _Float16.

John McCall via Phabricator via cfe-commits Mon, 22 Aug 2022 12:52:33 -0700

rjmccall added inline comments.

================
Comment at: clang/test/CodeGen/X86/Float16-arithmetic.c:207
+// CHECK-NEXT:    [[EXT:%.*]] = fpext half [[TMP0]] to float
+// CHECK-NEXT:    store float [[EXT]], ptr [[RETVAL]], align 2
+// CHECK-NEXT:    [[TMP1:%.*]] = load half, ptr [[RETVAL]], align 2
----------------
zahiraam wrote:
> zahiraam wrote:
> > pengfei wrote:
> > > Not sure if we need a fptrunc and store the half value. The following 
> > > tests have the same problem.
> > I think that's what we want?
> > // CHECK-LABEL: @RealOp(
> > // CHECK-NEXT:  entry:
> > // CHECK-NEXT:    [[A_ADDR:%.*]] = alloca half, align 2
> > // CHECK-NEXT:    store half [[A:%.*]], ptr [[A_ADDR]], align 2
> > // CHECK-NEXT:    [[TMP0:%.*]] = load half, ptr [[A_ADDR]], align 2
> > // CHECK-NEXT:    [[EXT:%.*]] = fpext half [[TMP0]] to float
> > // CHECK-NEXT:    [[UNPROMOTION:%.*]] = fptrunc float [[EXT]] to half
> > // CHECK-NEXT:    ret half [[UNPROMOTION]]
> > 
> > Do you agree? If this is correct, I will make the change the other 
> > operators.
> But I feel like we should be returning a float no?  In which case it will be 
> more tricky (need to calculate the Address with the promoted elementype)? 
> @rmjccall?
The function is declared as returning `_Float16`, not `float`.  This is 
therefore a question about when we're allowed to return a value in greater 
precision than the declared return type, which raises three sub-questions: one 
about ABI, one about language semantics, and one about our handling in the 
implementation.

The first is about ABI.  This mode is not supposed to be ABI-breaking, so 
whenever the ABI is in doubt, and the target makes a `_Float16` return 
incompatible with the ABI of a `float` return, we must use the former.  That 
means, at the very least, returning from a function with unknown call sites or 
calling a function with an unknown implementation.  We could potentially 
eliminate extra truncations here when we fully understand a call; for example, 
we could change the return type to `float` when the function is internal to a 
TU and not address-taken, or we could eliminate a trunc+extend pair after 
inlining.  It is fair to ask whether that's a good idea, however.

Anyway, concretely we're talking about two ABIs here:
- On x86_64, `_Float16` and `float` are not returned compatibly: they're both 
returned in `xmm0`, but the bit patterns are different, and the caller and 
callee must agree in order to preserve the value.
- On i386, `_Float16` and `float` *are* returned compatibly: they're both 
returned in `%st0`, promoted to the 80-bit format.

Let's assume for a second that we're interested in avoiding truncations in 
situations where the ABI doesn't limit us.  Then we have a question of language 
semantics, which is principally about this: does C's authorization of excess 
precision in intermediate results allows return values to propagate the excess 
precision?  The answer that appears to be yes, it does, per the explicit 
footnote at the end of the standard's description of the `return` statement:

> The return statement is not an assignment. The overlap restriction of 
> 6.5.16.1 does not apply to the case of function return. The representation of 
> floating-point values can have wider range or precision than implied by the 
> type; a cast can be used to remove this extra range and precision.

Okay.  So the third question is about implementation: how should we take 
advantage of this flexibility, assuming we actually want to?  A major part of 
the reason we're doing explicit promoted emission in the frontend in this patch 
is that only the frontend has the required knowledge of when to force 
truncation; and a big part of *that* is that explicit casts and assignments 
both force truncation, and neither has a persistent semantic representation in 
LLVM IR.  We cannot distinguish between a truncation that was only done to 
satisfy the ABI and a truncation that was required by the language semantics.  
Once we have a value as an `fp16` in IR, it's entirely possible that it 
underwent an operation that required excess precision to be discarded, one 
which no longer has any representation in IR.  I think the only reasonable way 
to avoid this would be to allow functions to directly return `float` (and 
receive `float`s as returns) with some sort of annotation that it actually has 
to be returned as an `fp16` to satisfy the ABI.  And within Clang, we would 
handle that by making the call/return emission interact with the promoted 
emitters we're adding in this patch.

We can put off all of that until later, though.  For now, we should just 
continue to return `_Float16`.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113107/new/

https://reviews.llvm.org/D113107

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D113107: Support of expression granularity for _Float16.

Reply via email to