pengfei added inline comments.
================ Comment at: clang/docs/LanguageExtensions.rst:852 ``double`` when passed to ``printf``, so the programmer must explicitly cast it to ``double`` before using it with an ``%f`` or similar specifier. ---------------- rjmccall wrote: > pengfei wrote: > > rjmccall wrote: > > > Suggested rework: > > > > > > ``` > > > Clang supports three half-precision (16-bit) floating point types: > > > ``__fp16``, > > > ``_Float16`` and ``__bf16``. These types are supported in all language > > > modes, but not on all targets: > > > > > > - ``__fp16`` is supported on every target. > > > > > > - ``_Float16`` is currently supported on the following targets: > > > * 32-bit ARM (natively on some architecture versions) > > > * 64-bit ARM (AArch64) (natively on ARMv8.2a and above) > > > * AMDGPU (natively) > > > * SPIR (natively) > > > * X86 (if SSE2 is available; natively if AVX512-FP16 is also available) > > > > > > - ``__bf16`` is currently supported on the following targets: > > > * 32-bit ARM > > > * 64-bit ARM (AArch64) > > > * X86 (when SSE2 is available) > > > > > > (For X86, SSE2 is available on 64-bit and all recent 32-bit processors.) > > > > > > ``__fp16`` and ``_Float16`` both use the binary16 format from IEEE > > > 754-2008, which provides a 5-bit exponent and an 11-bit significand > > > (counting the implicit leading 1). ``__bf16`` uses the `bfloat16 > > > <https://en.wikipedia.org/wiki/Bfloat16_floating-point_format>`_ format, > > > which provides an 8-bit exponent and an 8-bit significand; this is the > > > same > > > exponent range as `float`, just with greatly reduced precision. > > > > > > ``_Float16`` and ``__bf16`` follow the usual rules for arithmetic > > > floating-point types. Most importantly, this means that arithmetic > > > operations > > > on operands of these types are formally performed in the type and produce > > > values of the type. ``__fp16`` does not follow those rules: most > > > operations > > > immediately promote operands of type ``__fp16`` to ``float``, and so > > > arithmetic operations are defined to be performed in ``float`` and so > > > result in > > > a value of type ``float`` (unless further promoted because of other > > > operands). > > > See below for more information on the exact specifications of these types. > > > > > > Only some of the supported processors for ``__fp16`` and ``__bf16`` offer > > > native hardware support for arithmetic in their corresponding formats. > > > The exact conditions are described in the lists above. When compiling > > > for a > > > processor without native support, Clang will perform the arithmetic in > > > ``float``, inserting extensions and truncations as necessary. This can be > > > done in a way that exactly emulates the behavior of hardware support for > > > arithmetic, but it can require many extra operations. By default, Clang > > > takes > > > advantage of the C standard's allowances for excess precision in > > > intermediate > > > operands in order to eliminate intermediate truncations within statements. > > > This is generally much faster but can generate different results from > > > strict > > > operation-by-operation emulation. > > > > > > The use of excess precision can be independently controlled for these two > > > types with the ``-ffloat16-excess-precision=`` and > > > ``-fbfloat16-excess-precision=`` options. Valid values include: > > > - ``none`` (meaning to perform strict operation-by-operation emulation) > > > - ``standard`` (meaning that excess precision is permitted under the rules > > > described in the standard, i.e. never across explicit casts or > > > statements) > > > - ``fast`` (meaning that excess precision is permitted whenever the > > > optimizer sees an opportunity to avoid truncations; currently this has > > > no > > > effect beyond ``standard``) > > > > > > The ``_Float16`` type is an interchange floating type specified in > > > ISO/IEC TS 18661-3:2015 ("Floating-point extensions for C"). It will > > > be supported on more targets as they define ABIs for it. > > > > > > The ``__bf16`` type is a non-standard extension, but it generally follows > > > the rules for arithmetic interchange floating types from ISO/IEC TS > > > 18661-3:2015. In previous versions of Clang, it was a storage-only type > > > that forbade arithmetic operations. It will be supported on more targets > > > as they define ABIs for it. > > > > > > The ``__fp16`` type was originally an ARM extension and is specified > > > by the `ARM C Language Extensions > > > <https://github.com/ARM-software/acle/releases>`_. > > > Clang uses the ``binary16`` format from IEEE 754-2008 for ``__fp16``, > > > not the ARM alternative format. Operators that expect arithmetic operands > > > immediately promote ``__fp16`` operands to ``float``. > > > > > > It is recommended that portable code use ``_Float16`` instead of > > > ``__fp16``, > > > as it has been defined by the C standards committee and has behavior that > > > is > > > more familiar to most programmers. > > > > > > Because ``__fp16`` operands are always immediately promoted to ``float``, > > > the > > > common real type of ``__fp16`` and ``_Float16`` for the purposes of the > > > usual > > > arithmetic conversions is ``float``. > > > > > > A literal can be given ``_Float16`` type using the suffix ``f16``. For > > > example, > > > ``3.14f16``. > > > > > > Because default argument promotion only applies to the standard > > > floating-point > > > types, ``_Float16`` values are not promoted to ``double`` when passed as > > > variadic > > > or untyped arguments. As a consequence, some caution must be taken when > > > using > > > certain library facilities with ``_Float16``; for example, there is no > > > ``printf`` format > > > specifier for ``_Float16``, and (unlike ``float``) it will not be > > > implicitly promoted to > > > ``double`` when passed to ``printf``, so the programmer must explicitly > > > cast it to > > > ``double`` before using it with an ``%f`` or similar specifier. > > > ``` > > ``` > > Only some of the supported processors for ``__fp16`` and ``__bf16`` offer > > native hardware support for arithmetic in their corresponding formats. > > ``` > > > > Do you mean ``_Float16``? > > > > ``` > > The exact conditions are described in the lists above. When compiling for a > > processor without native support, Clang will perform the arithmetic in > > ``float``, inserting extensions and truncations as necessary. > > ``` > > > > It's a bit conflict with `These types are supported in all language modes, > > but not on all targets`. > > Why do we need to emulate for a type that doesn't necessarily support on > > all target? > > > > My understand is that inserting extensions and truncations are used for 2 > > purposes: > > 1. A type that is designed to support all target. For now, it's only used > > for __fp16. > > 2. Support excess-precision=`standard`. This applies for both _Float16 and > > __bf16. > > > > Do you mean `_Float16`? > > Yes, thank you. I knew I'd screw that up somewhere. > > > Why do we need to emulate for a type that doesn't necessarily support on > > all target? > > Would this be clearer? > > ``` > Arithmetic on ``_Float16`` and ``__bf16`` is enabled on some targets that > don't > provide native architectural support for arithmetic on these formats. These > targets are noted in the lists of supported targets above. On these targets, > Clang will perform the arithmetic in ``float``, inserting extensions and > truncations > as necessary. > ``` > > > My understand is that inserting extensions and truncations are used for 2 > > purposes: > > No, I believe we always insert extensions and truncations. The cases you're > describing are places we insert extensions and truncations in the *frontend*, > so that the backend doesn't see operations on `half` / `bfloat` at all. But > when these operations do make it to the backend, and there's no direct > architectural support for them on the target, the backend still just inserts > extensions and truncations so it can do the arithmetic in `float`. This is > clearest in the ARM codegen (https://godbolt.org/z/q9KoGEYqb) because the > conversions are just instructions, but you can also see it in the X86 codegen > (https://godbolt.org/z/ejdd4P65W): all the runtime functions are just > extensions/truncations, and the actual arithmetic is done with `mulss` and > `addss`. This frontend/backend distinction is not something that matters to > users, so the documentation glosses over the difference. > > I haven't done an exhaustive investigation, so it's possible that there are > types and targets where we emit a compiler-rt call to do each operation > instead, but those compiler-rt functions almost certainly just do an > extension to float in the same way, so I don't think the documentation as > written would be misleading for those targets, either. Thanks for the explanation! Sorry, I failed to make the distinction between "support" and "natively support", I guess users may be confusing at the beginning too. I agree the documentation is to explain the whole behavior of compile to user. I think we have 3 aspects that want to tell users: 1. Whether a type is arithmetic type or not and is (natively) supported by all targets or just a few; 2. The result of a type may not be consistent across different targets or/and excess-precision value; 3. The excess-precision control doesn't take effect if a type is natively supported by targets; It would be more clear if we can give such a summary before the detailed explanation. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D150913/new/ https://reviews.llvm.org/D150913 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits