zahiraam added a comment. In D113107#3138415 <https://reviews.llvm.org/D113107#3138415>, @rjmccall wrote:
> In D113107#3137921 <https://reviews.llvm.org/D113107#3137921>, @zahiraam > wrote: > >> In D113107#3136464 <https://reviews.llvm.org/D113107#3136464>, @rjmccall >> wrote: >> >>> Does GCC actually change the formal types of expressions to `float`, or is >>> it doing this "no intermediate casts thing" as some sort of >>> fp_contract-style custom emission of trees of expressions that involve >>> `_Float16`? >>> >>> In any case, doing operation-by-operation emulation seems like the right >>> first approach rather than starting by doing the less conformant thing. >> >> I have created another patch https://reviews.llvm.org/D114099 that does the >> first step. >> >> Not sure what you mean by "no intermediate casts thing". > > I think we keep dancing around this in this review, so let me go back and > start from the basics. There are four approaches I know of for evaluating a > homogeneous `_Float16` expression like `a + b + c`: > > 1. You can perform each operation with normal `_Float16` semantics. Ideally, > you would have hardware support for this. If that isn't available, you can > emulate the operations in software. It happens to be true that, for the > operations (`+`, `-`, `*`, `/`, `sqrt`) on `_Float16`, this emulation can > just involve converting to e.g. `float`, doing the operation, and immediately > converting back. The core property of this approach is that there are no > detectable differences from hardware support. > > 2. As a slight twist on approach #1, you can ignore the differences between > native `_Float16` and emulation with `float`; instead, you just always do > arithmetic in `float`. This potentially changes the result in some cases; > e.g. Steve Canon tells me that FMAs on `float` avoid some rounding errors > that FMAs on `_Float16` fall subject to. > > 3. Approaches #1 and #2 require a lot of intermediate conversions when > hardware isn't available. In our example, `a + b + c` has to be calculated > as `(_Float16) ((float) (_Float16) ((float) a + (float) b) + (float) c)`, > where the result of one addition is converted down and then converted back > again. You can avoid this by specifically recognizing this pattern and > eliminating the conversion from sub-operations that happen to be of type > `float`, so that in our example, `a + b + c` would be calculated as > `(_Float16) ((float) a + (float) b + (float) c)`. This is actually allowed > by the C standard by default as a form of FP contraction; in fact, I believe > C's rules for FP contraction were originally designed for exactly this kind > of situation, except that it was emulating `float` with `double` on hardware > that only provided arithmetic on the latter. Obviously, this can change > results. > > 4. The traditional language rule for `__fp16` is superficially similar to > Approach #3 in terms of generated code, but it has some subtle differences in > terms of the language. `__fp16` is immediately promoted to `float` whenever > it appears as an arithmetic operand. What this means is that operations are > performed in `float` but then not formally converted back (unless they're > used in a context which requires a value of the original type, which entails > a normal conversion, just as if you assigned a `double` into a `float` > variable). Thus, for example, `a + b + c` would actually have type `float`, > not type `__fp16`. > > What this patch is doing to `_Float16` is approach #4, basically treating it > like `__fp16`. That is non-conformant, and it doesn't seem to be what GCC > does. You can see that quite clearly here: https://godbolt.org/z/55oaajoax > > What I believe GCC is doing (when not forbidden by `-fexcess-precision`) is > approach #3: basically, FP contraction on expressions of `_Float16` type. @rjmccall Thanks for the clarification. That's very helpful. This patch is indeed doing #3, but that was my understanding from @pengfei.: doing for _Float16 with no avx512fp16, what _fp16 is doing for expressions. But really we want to follow gcc on this. ================ Comment at: clang/lib/CodeGen/CGExprScalar.cpp:1315 + if ((SrcType->isHalfType() || iSFloat16Allowed) && + !CGF.getContext().getLangOpts().NativeHalfType) { // Cast to FP using the intrinsic if the half type itself isn't supported. ---------------- zahiraam wrote: > pengfei wrote: > > zahiraam wrote: > > > zahiraam wrote: > > > > pengfei wrote: > > > > > rjmccall wrote: > > > > > > pengfei wrote: > > > > > > > rjmccall wrote: > > > > > > > > pengfei wrote: > > > > > > > > > rjmccall wrote: > > > > > > > > > > Okay, this condition is pretty ridiculous to be repeating > > > > > > > > > > in three different places across the compiler. Especially > > > > > > > > > > since you're going to change it when you implement the new > > > > > > > > > > option, right? > > > > > > > > > > > > > > > > > > > > Can we state this condition more generally? I'm not sure > > > > > > > > > > why this is so narrowly restricted, and the variable name > > > > > > > > > > isn't telling me anything, since `_Float16` must by > > > > > > > > > > definition be "allowed" if we have an expression of > > > > > > > > > > `_Float16` type. > > > > > > > > > > since _Float16 must by definition be "allowed" if we have > > > > > > > > > > an expression of _Float16 type. > > > > > > > > > > > > > > > > > > _Float16 is allowed only on a few targets. > > > > > > > > > https://clang.llvm.org/docs/LanguageExtensions.html#half-precision-floating-point > > > > > > > > > By the way, we should update for X86 since it's not limited > > > > > > > > > to avx512fp16 now. > > > > > > > > > _Float16 is allowed only on a few targets. > > > > > > > > > > > > > > > > Yes, I know that. But if `SrcType->isFloat16Type()` is true, > > > > > > > > we must be on one of those targets, because the type doesn't > > > > > > > > otherwise exist. > > > > > > > I see your point now. The problem here is we want to allow the > > > > > > > `_Float16` to be used more broadly. But the target doesn't really > > > > > > > support it sometime. Currently full arithmatic operations are > > > > > > > supported only on target with AVX512FP16. > > > > > > > We should cast for those targets without AVX512FP16 while avoid > > > > > > > to do on AVX512FP16. > > > > > > I agree that many targets don't natively support arithmetic on this > > > > > > format, but x86 is not the first one that does. Unless I'm > > > > > > misunderstanding, we already track this property in Clang's > > > > > > TargetInfo as `hasLegalHalfType()`. `+avx512fp16` presumably ought > > > > > > to set this. > > > > > > > > > > > > I'm not sure what the interaction with the `NativeHalfType` LangOpt > > > > > > is supposed to be here. My understanding is that that option is > > > > > > just supposed to affect `__fp16`, basically turning it into a > > > > > > proper arithmetic type, i.e. essentially `_Float16`. Whatever > > > > > > effect you want to apply to `_Float16` should presumably happen > > > > > > even if that option not set. > > > > > > > > > > > > More broadly, I don't think your approach in this patch is correct. > > > > > > The type of operations on `_Float16` should not change based on > > > > > > whether the target natively supports `_Float16`. If we need to > > > > > > emulate those operations on targets that don't provide them > > > > > > natively, we should do that at a lower level than the type system. > > > > > > > > > > > > The most appropriate place to do that is going to depend on the > > > > > > exact semantics we want. > > > > > > > > > > > > If we want to preserve `half` semantics exactly regardless of > > > > > > target, we should have Clang's IR generation actually emit `half` > > > > > > operations. Targets that don't support those operations natively > > > > > > will have to lower at least some of those operations into > > > > > > compiler-rt calls, but that's not at all unprecedented. > > > > > > > > > > > > If we're okay with playing loose for performance reasons, we can > > > > > > promote to `float` immediately around individual arithmetic > > > > > > operations. IR generation is probably the most appropriate place > > > > > > to do that. But I'm quite concerned about that making `_Float16` > > > > > > feel like an unpredictable/unportable type; it seems to me that > > > > > > software emulation is much better. > > > > > > > > > > > > If you're proposing the latter, I think you need to raise that more > > > > > > widely than a code review; please make a post on llvm-dev. > > > > > > we already track this property in Clang's TargetInfo as > > > > > > `hasLegalHalfType()` > > > > > > > > > > That sounds a good approch. Thank you. > > > > > > > > > > > The type of operations on `_Float16` should not change based on > > > > > > whether the target natively supports `_Float16`. If we need to > > > > > > emulate those operations on targets that don't provide them > > > > > > natively, we should do that at a lower level than the type system. > > > > > > > > > > Unfortunately, we can't do it at low level. The reason is (I'm not > > > > > expert in frontend, just recalled from last disscussion with GCC > > > > > folks) we have to do expresssion emulation to respect C/C++ > > > > > semantics. GCC has option `-fexcess-precision=16` to match the same > > > > > result with native instructions, but the default is > > > > > `-fexcess-precision=fast` according to language semantics. > > > > > > > > > > > The most appropriate place to do that is going to depend on the > > > > > > exact semantics we want... > > > > > > > > > > Note, we are not simply doing emulation in the frontend. It's > > > > > backend's responsibility to emulate a single `half` operation. But > > > > > it's frontend's responsibility to choose whether to emit several > > > > > `half` operations or emit promote + several `float` operations + > > > > > truncate. As described in the title, this patch is doing for the > > > > > latter. > > > > > Okay, this condition is pretty ridiculous to be repeating in three > > > > > different places across the compiler. Especially since you're going > > > > > to change it when you implement the new option, right? > > > > > > > > > > Can we state this condition more generally? I'm not sure why this is > > > > > so narrowly restricted, and the variable name isn't telling me > > > > > anything, since `_Float16` must by definition be "allowed" if we have > > > > > an expression of `_Float16` type. > > > > > > > > I agree :) I wasn't aware of this flag. I am still in the process of > > > > figuring out how float16 work and what is exactly required for it. > > > > Yes may be having 2 separate patches is the right way to go. One that > > > > turns Float16 on for x86 without the use of +avx512fp16 feature and > > > > another one that does the emulation ( half to float -> float > > > > arithmetic-> truncation to half). @pengfei ? > > > > > we already track this property in Clang's TargetInfo as > > > > > `hasLegalHalfType()` > > > > > > > > That sounds a good approch. Thank you. > > > > > > > > > The type of operations on `_Float16` should not change based on > > > > > whether the target natively supports `_Float16`. If we need to > > > > > emulate those operations on targets that don't provide them natively, > > > > > we should do that at a lower level than the type system. > > > > > > > > Unfortunately, we can't do it at low level. The reason is (I'm not > > > > expert in frontend, just recalled from last disscussion with GCC folks) > > > > we have to do expresssion emulation to respect C/C++ semantics. GCC has > > > > option `-fexcess-precision=16` to match the same result with native > > > > instructions, but the default is `-fexcess-precision=fast` according to > > > > language semantics. > > > > > > > > > The most appropriate place to do that is going to depend on the exact > > > > > semantics we want... > > > > > > > > Note, we are not simply doing emulation in the frontend. It's backend's > > > > responsibility to emulate a single `half` operation. But it's > > > > frontend's responsibility to choose whether to emit several `half` > > > > operations or emit promote + several `float` operations + truncate. As > > > > described in the title, this patch is doing for the latter. > > > > > > The _Float16 type is supported for both C and C++, on x86 systems with > > > SSE2 enabled. > > > From https://gcc.gnu.org/onlinedocs/gcc/Half-Precision.html: > > > "On x86 targets with SSE2 enabled, **without -mavx512fp16, all operations > > > will be emulated by software emulation** and the float instructions. The > > > default behavior for FLT_EVAL_METHOD is to keep the intermediate result > > > of the operation as 32-bit precision. This may lead to inconsistent > > > behavior between software emulation and AVX512-FP16 instructions. Using > > > -fexcess-precision=16 will force round back after each operation. > > > > > > Using -mavx512fp16 will generate AVX512-FP16 instructions instead of > > > software emulation. The default behavior of FLT_EVAL_METHOD is to round > > > after each operation. The same is true with -fexcess-precision=standard > > > and -mfpmath=sse. If there is no -mfpmath=sse, > > > -fexcess-precision=standard alone does the same thing as before, It is > > > useful for code that does not have _Float16 and runs on the x87 FPU." > > > This is what we are trying to reach out with this patch. > > > > > I planed to enable Float16 and emulation in one patch, i.e., D107082. It's > > almost backend work. The only thing I can't do in that patch is emulating > > in C/C++ expression granularity, which has to leverage FE facilities. > > That's all what I expected for this patch. > > I'm ashamed I haven't finished it yet and leaving confusion here :( > @pengfei I propose to re-factor the code in this patch and in D107082 in > order to create the first patch. Then when that's done we can work on the > emulation? Please see https://reviews.llvm.org/D114099 and review please. Thanks. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D113107/new/ https://reviews.llvm.org/D113107 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits