Jeff Law <jeffreya...@gmail.com> writes: > On 6/9/25 12:40 PM, Dimitar Dimitrov wrote: >> On Sun, Jun 08, 2025 at 09:09:44AM -0600, Jeff Law wrote: >>> >>> >>> On 6/5/25 2:16 PM, Dimitar Dimitrov wrote: >>>> PR119966 showed that combine could generate unfoldable hardware subregs >>>> for pru-unknown-elf. To fix, strengthen the checks performed by >>>> validate_subreg. >>>> >>>> The simplify_subreg_regno performs more validity checks than >>>> the simple info.representable_p. Most importantly, the >>>> targetm.hard_regno_mode_ok hook is called to ensure the hardware >>>> register is valid in subreg's outer mode. This fixes the rootcause >>>> for PR119966. >>>> >>>> The checks for stack-related registers are bypassed because the i386 >>>> backend generates them, in this seemingly valid peephole optimization: >>>> >>>> ;; Attempt to always use XOR for zeroing registers (including FP >>>> modes). >>>> (define_peephole2 >>>> [(set (match_operand 0 "general_reg_operand") >>>> (match_operand 1 "const0_operand"))] >>>> "GET_MODE_SIZE (GET_MODE (operands[0])) <= UNITS_PER_WORD >>>> && (! TARGET_USE_MOV0 || optimize_insn_for_size_p ()) >>>> && peep2_regno_dead_p (0, FLAGS_REG)" >>>> [(parallel [(set (match_dup 0) (const_int 0)) >>>> (clobber (reg:CC FLAGS_REG))])] >>>> "operands[0] = gen_lowpart (word_mode, operands[0]);") >>>> >>>> Testing done: >>>> * No regressions were detected for C and C++ on x86_64-pc-linux-gnu. >>>> * "contrib/compare-all-tests i386" showed no difference in code >>>> generation. >>>> * No regressions for pru-unknown-elf. >>>> * Reverted r16-809-gf725d6765373f7 to expose the now latent PR119966. >>>> Then ensured pru-unknown-elf build is ok. Only two cases regressed >>>> where rnreg pass transforms a valid hardware subreg into invalid >>>> one. But I think that is not related to combine's PR119966: >>>> gcc.c-torture/execute/20040709-1.c >>>> gcc.c-torture/execute/20040709-2.c >>>> >>>> This patch was provisionally approved in: >>>> https://gcc.gnu.org/pipermail/gcc-patches/2025-June/685492.html >>>> I'll wait for 2 days to get pre-commit CI results, then will commit it. >>>> >>>> PR target/119966 >>>> >>>> gcc/ChangeLog: >>>> >>>> * emit-rtl.cc (validate_subreg): Call simplify_subreg_regno >>>> instead of checking info.representable_p.. >>>> * rtl.h (simplify_subreg_regno): Add new argument >>>> allow_stack_regs. >>>> * rtlanal.cc (simplify_subreg_regno): Do not reject >>>> stack-related registers if allow_stack_regs is true. >>> Note that my system is primarily post-commit unless I >>> explicitly throw in a patch for testing purposes. So >>> figure ~24hrs after the patch goes in we'll have the >>> vast majority of the coverage. The native emulated >>> targets will trickle in over the remainder of the week. >>> >> >> I did more testing. riscv32-unknown-elf and riscv64-unknown-linux-gnu >> are ok. But this patch breaks AVR due to the following workaround for >> old reload in avr_hard_regno_mode_ok: >> >> if (GET_MODE_SIZE (mode) >= 4 >> && regno >= REG_X >> // This problem only concerned the old reload. >> && ! avropt_lra_p) >> return false; >> >> The following insn causes a split for "zero_extendsidi2" to ICE because >> operands are replaced with output from simplify_gen_subreg, which now >> fails: >> >> (insn 17 16 18 2 (set (reg/v:DI 24 r24 [orig:44 lowD.4526 ] [44]) >> (zero_extend:DI (reg:SI 22 r22 [orig:66 aD.4519 ] [66]))) >> "/mnt/nvme/dinux/local-workspace/gcc/libgcc/fixed-bit.c":790:7 827 >> {zero_extendsidi2} >> (nil)) >> >> That should be fixed once AVR is switched to LRA (PR113934). >> >> Richard, Jeff, it does not seem appropriate to merge this patch now, >> given that it breaks avr and or1k. Let me know if such experiment is >> desired despite the known breakages. > It's a bit of a judgment call -- we don't require changes to be clean > across every target, that would be an undue testing burden on contributors. > > But knowing it's causing an avr fail means we definitely need to be a > bit more cautious. An argument could be made that this is a bug in the > avr port -- the code in question looks quite bogus and the fact that the > "fix" is to stop using reload. > > I think a reasonable step forward would be to put this into my tester > and see if anything else pops out. If nothing else pops out then I > would suggest we flip the AVR port to LRA and install the patch. If > other things pop out, then we'll have to revise the plan. > > So I'll throw it into my tester. We'll have data on the crosses > tomorrow AM my time.
Thanks, sounds good to me FWIW. I agree that we shouldn't let dubious port-specific code hold the patch back. But the cross-port testing would help detect whether there is a legitimate corner case that we haven't thought about. (Neither the avr nor the or1k cases are legitimate corner cases, IMO.) Richard