rjmccall added inline comments.
================ Comment at: clang/test/CodeGen/X86/Float16-arithmetic.c:207 +// CHECK-NEXT: [[EXT:%.*]] = fpext half [[TMP0]] to float +// CHECK-NEXT: store float [[EXT]], ptr [[RETVAL]], align 2 +// CHECK-NEXT: [[TMP1:%.*]] = load half, ptr [[RETVAL]], align 2 ---------------- pengfei wrote: > rjmccall wrote: > > zahiraam wrote: > > > zahiraam wrote: > > > > pengfei wrote: > > > > > Not sure if we need a fptrunc and store the half value. The following > > > > > tests have the same problem. > > > > I think that's what we want? > > > > // CHECK-LABEL: @RealOp( > > > > // CHECK-NEXT: entry: > > > > // CHECK-NEXT: [[A_ADDR:%.*]] = alloca half, align 2 > > > > // CHECK-NEXT: store half [[A:%.*]], ptr [[A_ADDR]], align 2 > > > > // CHECK-NEXT: [[TMP0:%.*]] = load half, ptr [[A_ADDR]], align 2 > > > > // CHECK-NEXT: [[EXT:%.*]] = fpext half [[TMP0]] to float > > > > // CHECK-NEXT: [[UNPROMOTION:%.*]] = fptrunc float [[EXT]] to half > > > > // CHECK-NEXT: ret half [[UNPROMOTION]] > > > > > > > > Do you agree? If this is correct, I will make the change the other > > > > operators. > > > But I feel like we should be returning a float no? In which case it will > > > be more tricky (need to calculate the Address with the promoted > > > elementype)? @rmjccall? > > The function is declared as returning `_Float16`, not `float`. This is > > therefore a question about when we're allowed to return a value in greater > > precision than the declared return type, which raises three sub-questions: > > one about ABI, one about language semantics, and one about our handling in > > the implementation. > > > > The first is about ABI. This mode is not supposed to be ABI-breaking, so > > whenever the ABI is in doubt, and the target makes a `_Float16` return > > incompatible with the ABI of a `float` return, we must use the former. > > That means, at the very least, returning from a function with unknown call > > sites or calling a function with an unknown implementation. We could > > potentially eliminate extra truncations here when we fully understand a > > call; for example, we could change the return type to `float` when the > > function is internal to a TU and not address-taken, or we could eliminate a > > trunc+extend pair after inlining. It is fair to ask whether that's a good > > idea, however. > > > > Anyway, concretely we're talking about two ABIs here: > > - On x86_64, `_Float16` and `float` are not returned compatibly: they're > > both returned in `xmm0`, but the bit patterns are different, and the caller > > and callee must agree in order to preserve the value. > > - On i386, `_Float16` and `float` *are* returned compatibly: they're both > > returned in `%st0`, promoted to the 80-bit format. > > > > Let's assume for a second that we're interested in avoiding truncations in > > situations where the ABI doesn't limit us. Then we have a question of > > language semantics, which is principally about this: does C's authorization > > of excess precision in intermediate results allows return values to > > propagate the excess precision? The answer that appears to be yes, it > > does, per the explicit footnote at the end of the standard's description of > > the `return` statement: > > > > > The return statement is not an assignment. The overlap restriction of > > > 6.5.16.1 does not apply to the case of function return. The > > > representation of floating-point values can have wider range or precision > > > than implied by the type; a cast can be used to remove this extra range > > > and precision. > > > > Okay. So the third question is about implementation: how should we take > > advantage of this flexibility, assuming we actually want to? A major part > > of the reason we're doing explicit promoted emission in the frontend in > > this patch is that only the frontend has the required knowledge of when to > > force truncation; and a big part of *that* is that explicit casts and > > assignments both force truncation, and neither has a persistent semantic > > representation in LLVM IR. We cannot distinguish between a truncation that > > was only done to satisfy the ABI and a truncation that was required by the > > language semantics. Once we have a value as an `fp16` in IR, it's entirely > > possible that it underwent an operation that required excess precision to > > be discarded, one which no longer has any representation in IR. I think > > the only reasonable way to avoid this would be to allow functions to > > directly return `float` (and receive `float`s as returns) with some sort of > > annotation that it actually has to be returned as an `fp16` to satisfy the > > ABI. And within Clang, we would handle that by making the call/return > > emission interact with the promoted emitters we're adding in this patch. > > > > We can put off all of that until later, though. For now, we should just > > continue to return `_Float16`. > > Do you agree? If this is correct, I will make the change the other > > operators. > > I think it is correct, though it is a bit silly to do fpext + fptrunc for a > simple move. > > > On i386, `_Float16` and `float` *are* returned compatibly > > They are not. According to i386 [[ > https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI | psABI ]] both > `_Float16` and `_Complex _Float16` are passed/returned from XMM0, while > `float`/`double` etc. are passed/returned from ST0. > > Other target may have similar problem, e.g., passing/returning `float` from > FPR but `_Float16` from GPR. > > IIUC, the current way, i.e., always promote + unpromote each expression, is > just for easy implementation. I expected we can distinguish unary and binary > operations from multi operations. Only multi operations needs promote + > unpromote. > They are not. According to i386 psABI both _Float16 and _Complex _Float16 are > passed/returned from XMM0, while float/double etc. are passed/returned from > ST0. Ah, I see that now in the register-use table. On the other hand, the section on returning values says "[f]unctions that return scalar floating-point values in registers return them on the top of the x87 register stack, that is, %st0" without making exceptions. The table should presumably take precedence, but still, someone really needs to update that document properly. Anyway. So that puts i386 on the same level as x86_64 in terms of having ABI constraints which do not allow these functions to simply return a `float` without an ABI break, and thus cannot avoid truncations for return values without IPO. > Other target may have similar problem, e.g., passing/returning float from FPR > but _Float16 from GPR. Right, an ABI mismatch should be the default assumption for all targets. The i386 thing where some types are promoted for returns is a terrible legacy that everybody else is well warned off of, and as you point out, even there we don't have to worry about it for `_Float16`. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D113107/new/ https://reviews.llvm.org/D113107 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits