On 10/5/22 09:47, Jakub Jelinek wrote:
On Tue, Oct 04, 2022 at 05:50:50PM -0400, Jason Merrill wrote:
Another question is the suffixes of the builtins. For now I have added
bf16 suffix and enabled the builtins with !both_p, so one always needs to
use __builtin_* form for them. None of the GCC builtins end with b,
so this isn't ambiguous with __builtin_*f16, but some libm functions do end
with b, in particular ilogb, logb and f{??,??x}sub. ilogb and the subs
always have it, but is __builtin_logbf16 f16 suffixed logb or bf16 suffixed
log? Shall the builtins use f16b suffixes instead like the mangling does?
Do we want bf16 builtins at all? The impression I've gotten is that users
want computation to happen in SFmode and only later truncate back to BFmode.
As I wrote earlier, I think we need at least one, __builtin_nans variant
which would be used in libstdc++
std::numeric_limits<std::bfloat16_t>::signaling_NaN() implementation.
I think
std::numeric_limits<std::bfloat16_t>::infinity() can be implemented as
return (__bf16) __builtin_huge_valf ();
and similarly
std::numeric_limits<std::bfloat16_t>::quiet_NaN() as
return (__bf16) __builtin_nanf ("");
but
return (__bf16) __builtin_nansf ("");
would loose the signaling NaN on the conversion and raise exception,
and as the method is constexpr,
union { unsigned short a; __bf16 b; } u = { 0x7f81 };
return u.b;
wouldn't work. I can certainly restrict the builtins to the single
one, but wonder whether the suffix for that builtin shouldn't be chosen
such that eventually we could add more builtins if we need to
and don't run into the log with bf16 suffix vs. logb with f16 suffix
ambiguity.
As you said, most of the libstdc++ overloads for std::bfloat16_t then
can use float builtins or library calls under the hood, but std::nextafter
is another case where I think we'll need to have something bfloat16_t
specific, because float ulp isn't bfloat16_t ulp, the latter is much larger.
Makes sense.
Based on what Joseph wrote, I'll add bf16/BF16 suffix support for C too
in the next iteration (always with pedwarn in that case).
@@ -5716,7 +5716,13 @@ emit_store_flag_1 (rtx target, enum rtx_
{
machine_mode optab_mode = mclass == MODE_CC ? CCmode : compare_mode;
icode = optab_handler (cstore_optab, optab_mode);
- if (icode != CODE_FOR_nothing)
+ if (icode != CODE_FOR_nothing
+ /* Don't consider [BH]Fmode as usable wider mode, as neither is
+ a subset or superset of the other. */
+ && (compare_mode == mode
+ || !SCALAR_FLOAT_MODE_P (compare_mode)
+ || maybe_ne (GET_MODE_PRECISION (compare_mode),
+ GET_MODE_PRECISION (mode))))
Why do you need to do this here (and in prepare_cmp_insn, and similarly in
can_compare_p)? Shouldn't get_wider skip over modes that are not actually
wider?
I'm afraid too many places rely on all modes of a certain class to be
visible when walking from "narrowest" to "widest" mode, say
FOR_EACH_MODE_IN_CLASS/FOR_EACH_MODE/FOR_EACH_MODE_UNTIL/FOR_EACH_WIDER_MODE
etc. wouldn't work at all if GET_MODE_WIDER_MODE (BFmode) == SFmode
&& GET_MODE_WIDER_MODE (HFmode) == SFmode.
Yes, it seems they need to change now that their assumptions have been
violated. I suppose FOR_EACH_MODE_IN_CLASS would need to change to not
use get_wider, and users of FOR_EACH_MODE/FOR_EACH_MODE_UNTIL need to
decide whether they want an iteration that uses get_wider (likely with a
new name) or not.
Note, besides this GET_MODE_PRECISION (HFmode) == GET_MODE_PRECISION (BFmode)
case, another set of modes which have the same size are powerpc*
TFmode/IFmode/KFmode, but in that case it makes ugly hacks where it
artificially lowers the precision of 2 of them:
rs6000-modes.h:#define FLOAT_PRECISION_IFmode 128
rs6000-modes.h:#define FLOAT_PRECISION_TFmode 127
rs6000-modes.h:#define FLOAT_PRECISION_KFmode 126
(and the middle-end then has to work around that mess). Doing something
similar wouldn't help the BFmode vs. HFmode case though, one of them would
have wider precision and so e.g. C FE would then prefer it, but more
importantly, as they are unordered modes where most of the optabs aren't
implemented it is bad to pick optabs for the "wider" mode to handle the
"narrower" one. I think powerpc works because they define optabs for
all the 3 modes when those modes are usable.
Jakub