Jeff Law <jeffreya...@gmail.com> writes:
> On 6/9/25 12:40 PM, Dimitar Dimitrov wrote:
>> On Sun, Jun 08, 2025 at 09:09:44AM -0600, Jeff Law wrote:
>>>
>>>
>>> On 6/5/25 2:16 PM, Dimitar Dimitrov wrote:
>>>> PR119966 showed that combine could generate unfoldable hardware subregs
>>>> for pru-unknown-elf.  To fix, strengthen the checks performed by
>>>> validate_subreg.
>>>>
>>>> The simplify_subreg_regno performs more validity checks than
>>>> the simple info.representable_p.  Most importantly, the
>>>> targetm.hard_regno_mode_ok hook is called to ensure the hardware
>>>> register is valid in subreg's outer mode.  This fixes the rootcause
>>>> for PR119966.
>>>>
>>>> The checks for stack-related registers are bypassed because the i386
>>>> backend generates them, in this seemingly valid peephole optimization:
>>>>
>>>>      ;; Attempt to always use XOR for zeroing registers (including FP 
>>>> modes).
>>>>      (define_peephole2
>>>>        [(set (match_operand 0 "general_reg_operand")
>>>>              (match_operand 1 "const0_operand"))]
>>>>        "GET_MODE_SIZE (GET_MODE (operands[0])) <= UNITS_PER_WORD
>>>>         && (! TARGET_USE_MOV0 || optimize_insn_for_size_p ())
>>>>         && peep2_regno_dead_p (0, FLAGS_REG)"
>>>>        [(parallel [(set (match_dup 0) (const_int 0))
>>>>                    (clobber (reg:CC FLAGS_REG))])]
>>>>        "operands[0] = gen_lowpart (word_mode, operands[0]);")
>>>>
>>>> Testing done:
>>>>     * No regressions were detected for C and C++ on x86_64-pc-linux-gnu.
>>>>     * "contrib/compare-all-tests i386" showed no difference in code
>>>>       generation.
>>>>     * No regressions for pru-unknown-elf.
>>>>     * Reverted r16-809-gf725d6765373f7 to expose the now latent PR119966.
>>>>       Then ensured pru-unknown-elf build is ok.  Only two cases regressed
>>>>       where rnreg pass transforms a valid hardware subreg into invalid
>>>>       one.  But I think that is not related to combine's PR119966:
>>>>         gcc.c-torture/execute/20040709-1.c
>>>>         gcc.c-torture/execute/20040709-2.c
>>>>
>>>> This patch was provisionally approved in:
>>>>     https://gcc.gnu.org/pipermail/gcc-patches/2025-June/685492.html
>>>> I'll wait for 2 days to get pre-commit CI results, then will commit it.
>>>>
>>>>    PR target/119966
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>>    * emit-rtl.cc (validate_subreg): Call simplify_subreg_regno
>>>>    instead of checking info.representable_p..
>>>>    * rtl.h (simplify_subreg_regno): Add new argument
>>>>    allow_stack_regs.
>>>>    * rtlanal.cc (simplify_subreg_regno): Do not reject
>>>>    stack-related registers if allow_stack_regs is true.
>>> Note that my system is primarily post-commit unless I
>>> explicitly throw in a patch for testing purposes.  So
>>> figure ~24hrs after the patch goes in we'll have the
>>> vast majority of the coverage.  The native emulated
>>> targets will trickle in over the remainder of the week.
>>>
>> 
>> I did more testing.  riscv32-unknown-elf and riscv64-unknown-linux-gnu
>> are ok.  But this patch breaks AVR due to the following workaround for
>> old reload in avr_hard_regno_mode_ok:
>> 
>>    if (GET_MODE_SIZE (mode) >= 4
>>        && regno >= REG_X
>>        // This problem only concerned the old reload.
>>        && ! avropt_lra_p)
>>      return false;
>> 
>> The following insn causes a split for "zero_extendsidi2" to ICE because
>> operands are replaced with output from simplify_gen_subreg, which now
>> fails:
>> 
>>    (insn 17 16 18 2 (set (reg/v:DI 24 r24 [orig:44 lowD.4526 ] [44])
>>            (zero_extend:DI (reg:SI 22 r22 [orig:66 aD.4519 ] [66]))) 
>> "/mnt/nvme/dinux/local-workspace/gcc/libgcc/fixed-bit.c":790:7 827 
>> {zero_extendsidi2}
>>         (nil))
>> 
>> That should be fixed once AVR is switched to LRA (PR113934).
>> 
>> Richard, Jeff, it does not seem appropriate to merge this patch now,
>> given that it breaks avr and or1k.  Let me know if such experiment is
>> desired despite the known breakages.
> It's a bit of a judgment call -- we don't require changes to be clean 
> across every target, that would be an undue testing burden on contributors.
>
> But knowing it's causing an avr fail means we definitely need to be a 
> bit more cautious.  An argument could be made that this is a bug in the 
> avr port -- the code in question looks quite bogus and the fact that the 
> "fix" is to stop using reload.
>
> I think a reasonable step forward would be to put this into my tester 
> and see if anything else pops out.  If nothing else pops out then I 
> would suggest we flip the AVR port to LRA and install the patch.  If 
> other things pop out, then we'll have to revise the plan.
>
> So I'll throw it into my tester.  We'll have data on the crosses 
> tomorrow AM my time.

Thanks, sounds good to me FWIW.  I agree that we shouldn't let dubious
port-specific code hold the patch back.  But the cross-port testing
would help detect whether there is a legitimate corner case that we
haven't thought about.  (Neither the avr nor the or1k cases are
legitimate corner cases, IMO.)

Richard

Reply via email to