[PATCH] D113107: Support of expression granularity for _Float16.

Zahira Ammarguellat via Phabricator via cfe-commits Wed, 17 Nov 2021 12:37:43 -0800

zahiraam added a comment.

In D113107#3138415 <https://reviews.llvm.org/D113107#3138415>, @rjmccall wrote:


> In D113107#3137921 <https://reviews.llvm.org/D113107#3137921>, @zahiraam 
> wrote:
>
>> In D113107#3136464 <https://reviews.llvm.org/D113107#3136464>, @rjmccall 
>> wrote:
>>
>>> Does GCC actually change the formal types of expressions to `float`, or is 
>>> it doing this "no intermediate casts thing" as some sort of 
>>> fp_contract-style custom emission of trees of expressions that involve 
>>> `_Float16`?
>>>
>>> In any case, doing operation-by-operation emulation seems like the right 
>>> first approach rather than starting by doing the less conformant thing.
>>
>> I have created another patch https://reviews.llvm.org/D114099 that does the 
>> first step.
>>
>> Not sure what you mean by "no intermediate casts thing".
>
> I think we keep dancing around this in this review, so let me go back and 
> start from the basics.  There are four approaches I know of for evaluating a 
> homogeneous `_Float16` expression like `a + b + c`:
>
> 1. You can perform each operation with normal `_Float16` semantics.  Ideally, 
> you would have hardware support for this.  If that isn't available, you can 
> emulate the operations in software.  It happens to be true that, for the 
> operations (`+`, `-`, `*`, `/`, `sqrt`) on `_Float16`, this emulation can 
> just involve converting to e.g. `float`, doing the operation, and immediately 
> converting back.  The core property of this approach is that there are no 
> detectable differences from hardware support.
>
> 2. As a slight twist on approach #1, you can ignore the differences between 
> native `_Float16` and emulation with `float`; instead, you just always do 
> arithmetic in `float`.  This potentially changes the result in some cases; 
> e.g. Steve Canon tells me that FMAs on `float` avoid some rounding errors 
> that FMAs on `_Float16` fall subject to.
>
> 3. Approaches #1 and #2 require a lot of intermediate conversions when 
> hardware isn't available.  In our example, `a + b + c` has to be calculated 
> as `(_Float16) ((float) (_Float16) ((float) a + (float) b) + (float) c)`, 
> where the result of one addition is converted down and then converted back 
> again.  You can avoid this by specifically recognizing this pattern and 
> eliminating the conversion from sub-operations that happen to be of type 
> `float`, so that in our example, `a + b + c` would be calculated as 
> `(_Float16) ((float) a + (float) b + (float) c)`.  This is actually allowed 
> by the C standard by default as a form of FP contraction; in fact, I believe 
> C's rules for FP contraction were originally designed for exactly this kind 
> of situation, except that it was emulating `float` with `double` on hardware 
> that only provided arithmetic on the latter.  Obviously, this can change 
> results.
>
> 4. The traditional language rule for `__fp16` is superficially similar to 
> Approach #3 in terms of generated code, but it has some subtle differences in 
> terms of the language.  `__fp16` is immediately promoted to `float` whenever 
> it appears as an arithmetic operand.  What this means is that operations are 
> performed in `float` but then not formally converted back (unless they're 
> used in a context which requires a value of the original type, which entails 
> a normal conversion, just as if you assigned a `double` into a `float` 
> variable).  Thus, for example, `a + b + c` would actually have type `float`, 
> not type `__fp16`.
>
> What this patch is doing to `_Float16` is approach #4, basically treating it 
> like `__fp16`.  That is non-conformant, and it doesn't seem to be what GCC 
> does.  You can see that quite clearly here: https://godbolt.org/z/55oaajoax
>
> What I believe GCC is doing (when not forbidden by `-fexcess-precision`) is 
> approach #3: basically, FP contraction on expressions of `_Float16` type.

@rjmccall Thanks for the clarification. That's very helpful. This patch is 
indeed doing #3, but that was my understanding from @pengfei.: doing for 
_Float16 with no avx512fp16, what _fp16 is doing for expressions.  But really 
we want to follow gcc on this.



================
Comment at: clang/lib/CodeGen/CGExprScalar.cpp:1315
+  if ((SrcType->isHalfType() || iSFloat16Allowed) &&
+      !CGF.getContext().getLangOpts().NativeHalfType) {
     // Cast to FP using the intrinsic if the half type itself isn't supported.
----------------
zahiraam wrote:
> pengfei wrote:
> > zahiraam wrote:
> > > zahiraam wrote:
> > > > pengfei wrote:
> > > > > rjmccall wrote:
> > > > > > pengfei wrote:
> > > > > > > rjmccall wrote:
> > > > > > > > pengfei wrote:
> > > > > > > > > rjmccall wrote:
> > > > > > > > > > Okay, this condition is pretty ridiculous to be repeating 
> > > > > > > > > > in three different places across the compiler.  Especially 
> > > > > > > > > > since you're going to change it when you implement the new 
> > > > > > > > > > option, right?
> > > > > > > > > > 
> > > > > > > > > > Can we state this condition more generally?  I'm not sure 
> > > > > > > > > > why this is so narrowly restricted, and the variable name 
> > > > > > > > > > isn't telling me anything, since `_Float16` must by 
> > > > > > > > > > definition be "allowed" if we have an expression of 
> > > > > > > > > > `_Float16` type.
> > > > > > > > > > since _Float16 must by definition be "allowed" if we have 
> > > > > > > > > > an expression of _Float16 type.
> > > > > > > > > 
> > > > > > > > > _Float16 is allowed only on a few targets. 
> > > > > > > > > https://clang.llvm.org/docs/LanguageExtensions.html#half-precision-floating-point
> > > > > > > > > By the way, we should update for X86 since it's not limited 
> > > > > > > > > to avx512fp16 now.
> > > > > > > > > _Float16 is allowed only on a few targets.
> > > > > > > > 
> > > > > > > > Yes, I know that.  But if `SrcType->isFloat16Type()` is true, 
> > > > > > > > we must be on one of those targets, because the type doesn't 
> > > > > > > > otherwise exist.
> > > > > > > I see your point now. The problem here is we want to allow the 
> > > > > > > `_Float16` to be used more broadly. But the target doesn't really 
> > > > > > > support it sometime. Currently full arithmatic operations are 
> > > > > > > supported only on target with AVX512FP16.
> > > > > > > We should cast for those targets without AVX512FP16 while avoid 
> > > > > > > to do on AVX512FP16.
> > > > > > I agree that many targets don't natively support arithmetic on this 
> > > > > > format, but x86 is not the first one that does.  Unless I'm 
> > > > > > misunderstanding, we already track this property in Clang's 
> > > > > > TargetInfo as `hasLegalHalfType()`.  `+avx512fp16` presumably ought 
> > > > > > to set this.
> > > > > > 
> > > > > > I'm not sure what the interaction with the `NativeHalfType` LangOpt 
> > > > > > is supposed to be here.  My understanding is that that option is 
> > > > > > just supposed to affect `__fp16`, basically turning it into a 
> > > > > > proper arithmetic type, i.e. essentially `_Float16`.  Whatever 
> > > > > > effect you want to apply to `_Float16` should presumably happen 
> > > > > > even if that option not set.
> > > > > > 
> > > > > > More broadly, I don't think your approach in this patch is correct. 
> > > > > >  The type of operations on `_Float16` should not change based on 
> > > > > > whether the target natively supports `_Float16`.  If we need to 
> > > > > > emulate those operations on targets that don't provide them 
> > > > > > natively, we should do that at a lower level than the type system.
> > > > > > 
> > > > > > The most appropriate place to do that is going to depend on the 
> > > > > > exact semantics we want.
> > > > > > 
> > > > > > If we want to preserve `half` semantics exactly regardless of 
> > > > > > target, we should have Clang's IR generation actually emit `half` 
> > > > > > operations.  Targets that don't support those operations natively 
> > > > > > will have to lower at least some of those operations into 
> > > > > > compiler-rt calls, but that's not at all unprecedented.
> > > > > > 
> > > > > > If we're okay with playing loose for performance reasons, we can 
> > > > > > promote to `float` immediately around individual arithmetic 
> > > > > > operations.  IR generation is probably the most appropriate place 
> > > > > > to do that.  But I'm quite concerned about that making `_Float16` 
> > > > > > feel like an unpredictable/unportable type; it seems to me that 
> > > > > > software emulation is much better.
> > > > > > 
> > > > > > If you're proposing the latter, I think you need to raise that more 
> > > > > > widely than a code review; please make a post on llvm-dev.
> > > > > > we already track this property in Clang's TargetInfo as 
> > > > > > `hasLegalHalfType()`
> > > > > 
> > > > > That sounds a good approch. Thank you.
> > > > > 
> > > > > > The type of operations on `_Float16` should not change based on 
> > > > > > whether the target natively supports `_Float16`. If we need to 
> > > > > > emulate those operations on targets that don't provide them 
> > > > > > natively, we should do that at a lower level than the type system.
> > > > > 
> > > > > Unfortunately, we can't do it at low level. The reason is (I'm not 
> > > > > expert in frontend, just recalled from last disscussion with GCC 
> > > > > folks) we have to do expresssion emulation to respect C/C++ 
> > > > > semantics. GCC has option `-fexcess-precision=16` to match the same 
> > > > > result with native instructions, but the default is 
> > > > > `-fexcess-precision=fast` according to language semantics.
> > > > > 
> > > > > > The most appropriate place to do that is going to depend on the 
> > > > > > exact semantics we want...
> > > > > 
> > > > > Note, we are not simply doing emulation in the frontend. It's 
> > > > > backend's responsibility to emulate a single `half` operation. But 
> > > > > it's frontend's responsibility to choose whether to emit several 
> > > > > `half` operations or emit promote + several `float` operations + 
> > > > > truncate. As described in the title, this patch is doing for the 
> > > > > latter.
> > > > > Okay, this condition is pretty ridiculous to be repeating in three 
> > > > > different places across the compiler.  Especially since you're going 
> > > > > to change it when you implement the new option, right?
> > > > > 
> > > > > Can we state this condition more generally?  I'm not sure why this is 
> > > > > so narrowly restricted, and the variable name isn't telling me 
> > > > > anything, since `_Float16` must by definition be "allowed" if we have 
> > > > > an expression of `_Float16` type.
> > > > 
> > > > I agree  :)  I wasn't aware of this flag. I am still in the process of 
> > > > figuring out how float16 work and what is exactly required for it.
> > > > Yes may be having 2 separate patches is the right way to go. One that 
> > > > turns Float16 on for x86 without the use of +avx512fp16 feature and 
> > > > another one that does the emulation ( half to float -> float 
> > > > arithmetic-> truncation to half). @pengfei ?
> > > > > we already track this property in Clang's TargetInfo as 
> > > > > `hasLegalHalfType()`
> > > > 
> > > > That sounds a good approch. Thank you.
> > > > 
> > > > > The type of operations on `_Float16` should not change based on 
> > > > > whether the target natively supports `_Float16`. If we need to 
> > > > > emulate those operations on targets that don't provide them natively, 
> > > > > we should do that at a lower level than the type system.
> > > > 
> > > > Unfortunately, we can't do it at low level. The reason is (I'm not 
> > > > expert in frontend, just recalled from last disscussion with GCC folks) 
> > > > we have to do expresssion emulation to respect C/C++ semantics. GCC has 
> > > > option `-fexcess-precision=16` to match the same result with native 
> > > > instructions, but the default is `-fexcess-precision=fast` according to 
> > > > language semantics.
> > > > 
> > > > > The most appropriate place to do that is going to depend on the exact 
> > > > > semantics we want...
> > > > 
> > > > Note, we are not simply doing emulation in the frontend. It's backend's 
> > > > responsibility to emulate a single `half` operation. But it's 
> > > > frontend's responsibility to choose whether to emit several `half` 
> > > > operations or emit promote + several `float` operations + truncate. As 
> > > > described in the title, this patch is doing for the latter.
> > > 
> > > The _Float16 type is supported for both C and C++, on x86 systems with 
> > > SSE2 enabled. 
> > > From https://gcc.gnu.org/onlinedocs/gcc/Half-Precision.html:
> > > "On x86 targets with SSE2 enabled, **without -mavx512fp16, all operations 
> > > will be emulated by software emulation** and the float instructions. The 
> > > default behavior for FLT_EVAL_METHOD is to keep the intermediate result 
> > > of the operation as 32-bit precision. This may lead to inconsistent 
> > > behavior between software emulation and AVX512-FP16 instructions. Using 
> > > -fexcess-precision=16 will force round back after each operation.
> > > 
> > > Using -mavx512fp16 will generate AVX512-FP16 instructions instead of 
> > > software emulation. The default behavior of FLT_EVAL_METHOD is to round 
> > > after each operation. The same is true with -fexcess-precision=standard 
> > > and -mfpmath=sse. If there is no -mfpmath=sse, 
> > > -fexcess-precision=standard alone does the same thing as before, It is 
> > > useful for code that does not have _Float16 and runs on the x87 FPU."
> > > This is what we are trying to reach out with this patch.
> > > 
> > I planed to enable Float16 and emulation in one patch, i.e., D107082. It's 
> > almost backend work. The only thing I can't do in that patch is emulating 
> > in C/C++ expression granularity, which has to leverage FE facilities. 
> > That's all what I expected for this patch.
> > I'm ashamed I haven't finished it yet and leaving confusion here :(
> @pengfei I propose to re-factor the code in this patch and in D107082 in 
> order to create the first patch. Then when that's done we can work on the 
> emulation?
Please see https://reviews.llvm.org/D114099 and review please. Thanks.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113107/new/

https://reviews.llvm.org/D113107

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D113107: Support of expression granularity for _Float16.

Reply via email to