pengfei added inline comments.

================
Comment at: clang/test/CodeGen/X86/Float16-arithmetic.c:207
+// CHECK-NEXT:    [[EXT:%.*]] = fpext half [[TMP0]] to float
+// CHECK-NEXT:    store float [[EXT]], ptr [[RETVAL]], align 2
+// CHECK-NEXT:    [[TMP1:%.*]] = load half, ptr [[RETVAL]], align 2
----------------
rjmccall wrote:
> pengfei wrote:
> > rjmccall wrote:
> > > zahiraam wrote:
> > > > zahiraam wrote:
> > > > > pengfei wrote:
> > > > > > Not sure if we need a fptrunc and store the half value. The 
> > > > > > following tests have the same problem.
> > > > > I think that's what we want?
> > > > > // CHECK-LABEL: @RealOp(
> > > > > // CHECK-NEXT:  entry:
> > > > > // CHECK-NEXT:    [[A_ADDR:%.*]] = alloca half, align 2
> > > > > // CHECK-NEXT:    store half [[A:%.*]], ptr [[A_ADDR]], align 2
> > > > > // CHECK-NEXT:    [[TMP0:%.*]] = load half, ptr [[A_ADDR]], align 2
> > > > > // CHECK-NEXT:    [[EXT:%.*]] = fpext half [[TMP0]] to float
> > > > > // CHECK-NEXT:    [[UNPROMOTION:%.*]] = fptrunc float [[EXT]] to half
> > > > > // CHECK-NEXT:    ret half [[UNPROMOTION]]
> > > > > 
> > > > > Do you agree? If this is correct, I will make the change the other 
> > > > > operators.
> > > > But I feel like we should be returning a float no?  In which case it 
> > > > will be more tricky (need to calculate the Address with the promoted 
> > > > elementype)? @rmjccall?
> > > The function is declared as returning `_Float16`, not `float`.  This is 
> > > therefore a question about when we're allowed to return a value in 
> > > greater precision than the declared return type, which raises three 
> > > sub-questions: one about ABI, one about language semantics, and one about 
> > > our handling in the implementation.
> > > 
> > > The first is about ABI.  This mode is not supposed to be ABI-breaking, so 
> > > whenever the ABI is in doubt, and the target makes a `_Float16` return 
> > > incompatible with the ABI of a `float` return, we must use the former.  
> > > That means, at the very least, returning from a function with unknown 
> > > call sites or calling a function with an unknown implementation.  We 
> > > could potentially eliminate extra truncations here when we fully 
> > > understand a call; for example, we could change the return type to 
> > > `float` when the function is internal to a TU and not address-taken, or 
> > > we could eliminate a trunc+extend pair after inlining.  It is fair to ask 
> > > whether that's a good idea, however.
> > > 
> > > Anyway, concretely we're talking about two ABIs here:
> > > - On x86_64, `_Float16` and `float` are not returned compatibly: they're 
> > > both returned in `xmm0`, but the bit patterns are different, and the 
> > > caller and callee must agree in order to preserve the value.
> > > - On i386, `_Float16` and `float` *are* returned compatibly: they're both 
> > > returned in `%st0`, promoted to the 80-bit format.
> > > 
> > > Let's assume for a second that we're interested in avoiding truncations 
> > > in situations where the ABI doesn't limit us.  Then we have a question of 
> > > language semantics, which is principally about this: does C's 
> > > authorization of excess precision in intermediate results allows return 
> > > values to propagate the excess precision?  The answer that appears to be 
> > > yes, it does, per the explicit footnote at the end of the standard's 
> > > description of the `return` statement:
> > > 
> > > > The return statement is not an assignment. The overlap restriction of 
> > > > 6.5.16.1 does not apply to the case of function return. The 
> > > > representation of floating-point values can have wider range or 
> > > > precision than implied by the type; a cast can be used to remove this 
> > > > extra range and precision.
> > > 
> > > Okay.  So the third question is about implementation: how should we take 
> > > advantage of this flexibility, assuming we actually want to?  A major 
> > > part of the reason we're doing explicit promoted emission in the frontend 
> > > in this patch is that only the frontend has the required knowledge of 
> > > when to force truncation; and a big part of *that* is that explicit casts 
> > > and assignments both force truncation, and neither has a persistent 
> > > semantic representation in LLVM IR.  We cannot distinguish between a 
> > > truncation that was only done to satisfy the ABI and a truncation that 
> > > was required by the language semantics.  Once we have a value as an 
> > > `fp16` in IR, it's entirely possible that it underwent an operation that 
> > > required excess precision to be discarded, one which no longer has any 
> > > representation in IR.  I think the only reasonable way to avoid this 
> > > would be to allow functions to directly return `float` (and receive 
> > > `float`s as returns) with some sort of annotation that it actually has to 
> > > be returned as an `fp16` to satisfy the ABI.  And within Clang, we would 
> > > handle that by making the call/return emission interact with the promoted 
> > > emitters we're adding in this patch.
> > > 
> > > We can put off all of that until later, though.  For now, we should just 
> > > continue to return `_Float16`.
> > > Do you agree? If this is correct, I will make the change the other 
> > > operators.
> > 
> > I think it is correct, though it is a bit silly to do fpext + fptrunc for a 
> > simple move.
> > 
> > > On i386, `_Float16` and `float` *are* returned compatibly
> > 
> > They are not. According to i386 [[ 
> > https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI | psABI ]] 
> > both `_Float16` and `_Complex _Float16` are passed/returned from XMM0, 
> > while `float`/`double` etc. are passed/returned from ST0.
> > 
> > Other target may have similar problem, e.g., passing/returning `float` from 
> > FPR but `_Float16` from GPR.
> > 
> > IIUC, the current way, i.e., always promote + unpromote each expression, is 
> > just for easy implementation. I expected we can distinguish unary and 
> > binary operations from multi operations. Only multi operations needs 
> > promote + unpromote.
> > They are not. According to i386 psABI both _Float16 and _Complex _Float16 
> > are passed/returned from XMM0, while float/double etc. are passed/returned 
> > from ST0.
> 
> Ah, I see that now in the register-use table.  On the other hand, the section 
> on returning values says "[f]unctions that return scalar floating-point 
> values in registers return them on the top of the x87 register stack, that 
> is, %st0" without making exceptions.  The table should presumably take 
> precedence, but still, someone really needs to update that document properly.
> 
> Anyway.  So that puts i386 on the same level as x86_64 in terms of having ABI 
> constraints which do not allow these functions to simply return a `float` 
> without an ABI break, and thus cannot avoid truncations for return values 
> without IPO.
> 
> > Other target may have similar problem, e.g., passing/returning float from 
> > FPR but _Float16 from GPR.
> 
> Right, an ABI mismatch should be the default assumption for all targets.  The 
> i386 thing where some types are promoted for returns is a terrible legacy 
> that everybody else is well warned off of, and as you point out, even there 
> we don't have to worry about it for `_Float16`.
> someone really needs to update that document properly.

Filed issue here: https://gitlab.com/x86-psABIs/i386-ABI/-/issues/3

> thus cannot avoid truncations for return values without IPO.

I agree. Actually, no promotion, no truncation. If we can avoid unnecessary 
promotion, we don't need to eliminate truncations anymore.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113107/new/

https://reviews.llvm.org/D113107

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to