On 12/31/23 09:23, Roger Sayle wrote:
Very many thanks (and a Happy New Year) to the pre-commit
patch testing folks at linaro.org. Their testing has revealed that
although my patch is clean on x86_64, it triggers some problems
on aarch64 and arm. The issue (with the previous version of my
patch) is that these platforms require a paradoxical subreg to be
generated by the middle-end, where we were previously checking
for truly_noop_truncation.
This has been fixed (in revision 2) below. Where previously I had:
@@ -66,7 +66,9 @@ gen_lowpart_general (machine_mode mode, rtx x)
scalar_int_mode xmode;
if (is_a <scalar_int_mode> (GET_MODE (x), &xmode)
&& GET_MODE_SIZE (xmode) <= UNITS_PER_WORD
- && TRULY_NOOP_TRUNCATION_MODES_P (mode, xmode)
+ && (known_lt (GET_MODE_SIZE (mode), GET_MODE_SIZE (xmode))
+ ? TRULY_NOOP_TRUNCATION_MODES_P (mode, xmode)
+ : known_eq (GET_MODE_SIZE (mode), GET_MODE_SIZE (xmode)))
&& !reload_completed)
return gen_lowpart_general (mode, force_reg (xmode, x));
the correct change is:
scalar_int_mode xmode;
if (is_a <scalar_int_mode> (GET_MODE (x), &xmode)
&& GET_MODE_SIZE (xmode) <= UNITS_PER_WORD
- && TRULY_NOOP_TRUNCATION_MODES_P (mode, xmode)
+ && (known_ge (GET_MODE_SIZE (mode), GET_MODE_SIZE (xmode))
+ || TRULY_NOOP_TRUNCATION_MODES_P (mode, xmode))
&& !reload_completed)
return gen_lowpart_general (mode, force_reg (xmode, x));
i.e. we only call TRULY_NOOP_TRUNCATION_MODES_P when we
know we have a truncation, but the behaviour of non-truncations
is preserved (no longer depends upon unspecified behaviour) and
gen_lowpart_general is called to create the paradoxical SUBREG.
This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures. Ok for mainline?
Hopefully this revision tests cleanly on the linaro.org CI pipeline.
2023-12-31 Roger Sayle <ro...@nextmovesoftware.com>
gcc/ChangeLog
* combine.cc (make_extraction): Confirm that OUTPREC is less than
INPREC before calling TRULY_NOOP_TRUNCATION_MODES_P.
* expmed.cc (store_bit_field_using_insv): Likewise.
(extract_bit_field_using_extv): Likewise.
(extract_bit_field_as_subreg): Likewise.
* optabs-query.cc (get_best_extraction_insn): Likewise.
* optabs.cc (expand_parity): Likewise.
* rtlhooks.cc (gen_lowpart_general): Likewise.
* simplify-rtx.cc (simplify_truncation): Disallow truncations
to the same precision.
(simplify_unary_operation_1) <case TRUNCATE>: Move optimization
of truncations to the same mode earlier.
So just an FYI. This should probably wait for gcc-15. But it is worth
noting that this causes tbz_2.c to fail on aarch64. Basically with this
patch we use the "x" registers whereas before we used a "w" register.
Meaning we're doing a 64bit comparison now rather than a 32bit
comparison. THe test expects a 32bit comparison.
jeff