[PATCH] D150913: [Clang][BFloat16] Upgrade __bf16 to arithmetic type, change mangling, and extend excess precision support.

Phoebe Wang via Phabricator via cfe-commits Thu, 25 May 2023 00:19:30 -0700

pengfei added inline comments.


================
Comment at: clang/docs/LanguageExtensions.rst:852
 ``double`` when passed to ``printf``, so the programmer must explicitly cast 
it to
 ``double`` before using it with an ``%f`` or similar specifier.
 
----------------
rjmccall wrote:
> pengfei wrote:
> > rjmccall wrote:
> > > Suggested rework:
> > > 
> > > ```
> > > Clang supports three half-precision (16-bit) floating point types: 
> > > ``__fp16``,
> > > ``_Float16`` and ``__bf16``.  These types are supported in all language
> > > modes, but not on all targets:
> > > 
> > > - ``__fp16`` is supported on every target.
> > > 
> > > - ``_Float16`` is currently supported on the following targets:
> > >   * 32-bit ARM (natively on some architecture versions)
> > >   * 64-bit ARM (AArch64) (natively on ARMv8.2a and above)
> > >   * AMDGPU (natively)
> > >   * SPIR (natively)
> > >   * X86 (if SSE2 is available; natively if AVX512-FP16 is also available)
> > > 
> > > - ``__bf16`` is currently supported on the following targets:
> > >   * 32-bit ARM
> > >   * 64-bit ARM (AArch64)
> > >   * X86 (when SSE2 is available)
> > > 
> > > (For X86, SSE2 is available on 64-bit and all recent 32-bit processors.)
> > > 
> > > ``__fp16`` and ``_Float16`` both use the binary16 format from IEEE
> > > 754-2008, which provides a 5-bit exponent and an 11-bit significand
> > > (counting the implicit leading 1).  ``__bf16`` uses the `bfloat16
> > > <https://en.wikipedia.org/wiki/Bfloat16_floating-point_format>`_ format,
> > > which provides an 8-bit exponent and an 8-bit significand; this is the 
> > > same
> > > exponent range as `float`, just with greatly reduced precision.
> > > 
> > > ``_Float16`` and ``__bf16`` follow the usual rules for arithmetic
> > > floating-point types.  Most importantly, this means that arithmetic 
> > > operations
> > > on operands of these types are formally performed in the type and produce
> > > values of the type.  ``__fp16`` does not follow those rules: most 
> > > operations
> > > immediately promote operands of type ``__fp16`` to ``float``, and so
> > > arithmetic operations are defined to be performed in ``float`` and so 
> > > result in
> > > a value of type ``float`` (unless further promoted because of other 
> > > operands).
> > > See below for more information on the exact specifications of these types.
> > > 
> > > Only some of the supported processors for ``__fp16`` and ``__bf16`` offer
> > > native hardware support for arithmetic in their corresponding formats.
> > > The exact conditions are described in the lists above.  When compiling 
> > > for a
> > > processor without native support, Clang will perform the arithmetic in
> > > ``float``, inserting extensions and truncations as necessary.  This can be
> > > done in a way that exactly emulates the behavior of hardware support for
> > > arithmetic, but it can require many extra operations.  By default, Clang 
> > > takes
> > > advantage of the C standard's allowances for excess precision in 
> > > intermediate
> > > operands in order to eliminate intermediate truncations within statements.
> > > This is generally much faster but can generate different results from 
> > > strict
> > > operation-by-operation emulation.
> > > 
> > > The use of excess precision can be independently controlled for these two
> > > types with the ``-ffloat16-excess-precision=`` and
> > > ``-fbfloat16-excess-precision=`` options.  Valid values include:
> > > - ``none`` (meaning to perform strict operation-by-operation emulation)
> > > - ``standard`` (meaning that excess precision is permitted under the rules
> > >   described in the standard, i.e. never across explicit casts or 
> > > statements)
> > > - ``fast`` (meaning that excess precision is permitted whenever the
> > >   optimizer sees an opportunity to avoid truncations; currently this has 
> > > no
> > >   effect beyond ``standard``)
> > > 
> > > The ``_Float16`` type is an interchange floating type specified in
> > >  ISO/IEC TS 18661-3:2015 ("Floating-point extensions for C").  It will
> > > be supported on more targets as they define ABIs for it.
> > > 
> > > The ``__bf16`` type is a non-standard extension, but it generally follows
> > > the rules for arithmetic interchange floating types from ISO/IEC TS
> > > 18661-3:2015.  In previous versions of Clang, it was a storage-only type
> > > that forbade arithmetic operations.  It will be supported on more targets
> > > as they define ABIs for it.
> > > 
> > > The ``__fp16`` type was originally an ARM extension and is specified
> > > by the `ARM C Language Extensions 
> > > <https://github.com/ARM-software/acle/releases>`_.
> > > Clang uses the ``binary16`` format from IEEE 754-2008 for ``__fp16``,
> > > not the ARM alternative format.  Operators that expect arithmetic operands
> > > immediately promote ``__fp16`` operands to ``float``.
> > > 
> > > It is recommended that portable code use ``_Float16`` instead of 
> > > ``__fp16``,
> > > as it has been defined by the C standards committee and has behavior that 
> > > is
> > > more familiar to most programmers.
> > > 
> > > Because ``__fp16`` operands are always immediately promoted to ``float``, 
> > > the
> > > common real type of ``__fp16`` and ``_Float16`` for the purposes of the 
> > > usual
> > > arithmetic conversions is ``float``.
> > > 
> > > A literal can be given ``_Float16`` type using the suffix ``f16``. For 
> > > example,
> > > ``3.14f16``.
> > > 
> > > Because default argument promotion only applies to the standard 
> > > floating-point
> > > types, ``_Float16`` values are not promoted to ``double`` when passed as 
> > > variadic
> > > or untyped arguments.  As a consequence, some caution must be taken when 
> > > using
> > > certain library facilities with ``_Float16``; for example, there is no 
> > > ``printf`` format
> > > specifier for ``_Float16``, and (unlike ``float``) it will not be 
> > > implicitly promoted to
> > > ``double`` when passed to ``printf``, so the programmer must explicitly 
> > > cast it to
> > > ``double`` before using it with an ``%f`` or similar specifier.
> > > ```
> > ```
> > Only some of the supported processors for ``__fp16`` and ``__bf16`` offer
> > native hardware support for arithmetic in their corresponding formats.
> > ```
> > 
> > Do you mean ``_Float16``?
> > 
> > ```
> > The exact conditions are described in the lists above.  When compiling for a
> > processor without native support, Clang will perform the arithmetic in
> > ``float``, inserting extensions and truncations as necessary.
> > ```
> > 
> > It's a bit conflict with `These types are supported in all language modes, 
> > but not on all targets`.
> > Why do we need to emulate for a type that doesn't necessarily support on 
> > all target?
> > 
> > My understand is that inserting extensions and truncations are used for 2 
> > purposes:
> > 1. A type that is designed to support all target. For now, it's only used 
> > for __fp16.
> > 2. Support excess-precision=`standard`. This applies for both _Float16 and 
> > __bf16.
> > 
> > Do you mean `_Float16`?
> 
> Yes, thank you.  I knew I'd screw that up somewhere.
> 
> > Why do we need to emulate for a type that doesn't necessarily support on 
> > all target?
> 
> Would this be clearer?
> 
> ```
> Arithmetic on ``_Float16`` and ``__bf16`` is enabled on some targets that 
> don't
> provide native architectural support for arithmetic on these formats.  These
> targets are noted in the lists of supported targets above.  On these targets,
> Clang will perform the arithmetic in ``float``, inserting extensions and 
> truncations
> as necessary.
> ```
> 
> > My understand is that inserting extensions and truncations are used for 2 
> > purposes:
> 
> No, I believe we always insert extensions and truncations.  The cases you're 
> describing are places we insert extensions and truncations in the *frontend*, 
> so that the backend doesn't see operations on `half` / `bfloat` at all.  But 
> when these operations do make it to the backend, and there's no direct 
> architectural support for them on the target, the backend still just inserts 
> extensions and truncations so it can do the arithmetic in `float`.  This is 
> clearest in the ARM codegen (https://godbolt.org/z/q9KoGEYqb) because the 
> conversions are just instructions, but you can also see it in the X86 codegen 
> (https://godbolt.org/z/ejdd4P65W): all the runtime functions are just 
> extensions/truncations, and the actual arithmetic is done with `mulss` and 
> `addss`.  This frontend/backend distinction is not something that matters to 
> users, so the documentation glosses over the difference.
> 
> I haven't done an exhaustive investigation, so it's possible that there are 
> types and targets where we emit a compiler-rt call to do each operation 
> instead, but those compiler-rt functions almost certainly just do an 
> extension to float in the same way, so I don't think the documentation as 
> written would be misleading for those targets, either.
Thanks for the explanation! Sorry, I failed to make the distinction between 
"support" and "natively support", I guess users may be confusing at the 
beginning too.

I agree the documentation is to explain the whole behavior of compile to user. 
I think we have 3 aspects that want to tell users:

1. Whether a type is arithmetic type or not and is (natively) supported by all 
targets or just a few;
2. The result of a type may not be consistent across different targets or/and 
excess-precision value;
3. The excess-precision control doesn't take effect if a type is natively 
supported by targets;

It would be more clear if we can give such a summary before the detailed 
explanation.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150913/new/

https://reviews.llvm.org/D150913

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D150913: [Clang][BFloat16] Upgrade __bf16 to arithmetic type, change mangling, and extend excess precision support.

Reply via email to