Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes:
>> On Mon, Jul 19, 2021 at 11:38 PM Richard Sandiford
>> <richard.sandif...@arm.com> wrote:
>>>
>>> "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes:
>>> >> > +     {
>>> >> > +       /* First generate subreg of word mode if the previous mode is
>>> >> > +          wider than word mode and word mode is wider than MODE.  */
>>> >> > +       prev_rtx = simplify_gen_subreg (word_mode, prev_rtx,
>>> >> > +                                       prev_mode, 0);
>>> >> > +       prev_mode = word_mode;
>>> >> > +     }
>>> >> > +      if (prev_rtx != nullptr)
>>> >> > +     target = simplify_gen_subreg (mode, prev_rtx, prev_mode, 0);
>>> >>
>>> >> This should be lowpart_subreg, since 0 isn't the right offset for
>>> >> big-endian targets.  Using lowpart_subreg should also avoid the need
>>> >> for the word_size “if” above: lowpart_subreg can handle lowpart subword
>>> >> subregs of multiword values.
>>> >
>>> > I tried it.  It didn't work since it caused the LRA failure.   I replaced
>>> > simplify_gen_subreg with lowpart_subreg instead.
>>>
>>> What specifically went wrong?
>>
>> With vector broadcast, for
>> ---
>> extern void *ops;
>>
>> void
>> foo (int c)
>> {
>>   __builtin_memset (ops, c, 18);
>> }
>> ---
>> we generate HI from V16QI.   With a single lowpart_subreg, I get
>>
>> (insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84)
>>                 (const_int 16 [0x10])) [0 MEM <char[1:18]> [(void
>> *)ops.0_1]+16 S2 A8])
>>         (subreg:HI (reg:V16QI 51 xmm15) 0)) "s2a.i":6:3 76 {*movhi_internal}
>>      (nil))
>>
>> instead of
>>
>> (insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84)
>>                 (const_int 16 [0x10])) [0 MEM <char[1:18]> [(void
>> *)ops.0_1]+16 S2 A8])
>>         (subreg:HI (reg:DI 51 xmm15) 0)) "s2a.i":6:3 76 {*movhi_internal}
>>      (nil))
>>
>> IRA and LRA fail to reload:
>>
>> (insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84)
>>                 (const_int 16 [0x10])) [0 MEM <char[1:18]> [(void
>> *)ops.0_1]+16 S2 A8])
>>         (subreg:HI (reg:V16QI 51 xmm15) 0)) "s2a.i":6:3 76 {*movhi_internal}
>>      (nil))
>>
>> since ix86_can_change_mode_class has
>>
>>   if (MAYBE_SSE_CLASS_P (regclass) || MAYBE_MMX_CLASS_P (regclass))
>>     {
>>       /* Vector registers do not support QI or HImode loads.  If we don't
>>          disallow a change to these modes, reload will assume it's ok to
>>          drop the subreg from (subreg:SI (reg:HI 100) 0).  This affects
>>          the vec_dupv4hi pattern.  */
>>       if (GET_MODE_SIZE (from) < 4)
>>         return false;
>>     }
>
> Ah!  OK.  In that case, maybe we should have something like:
>
>    if (REG_P (prev_rtx)
>        && HARD_REGISTER_P (prev_rtx)
>        && REG_CAN_CHANGE_MODE_P (REGNO (prev_rtx), prev->mode, mode))

Sorry, make that last line:

  && lowpart_subreg_regno (REGNO (prev_rtx), prev->mode, mode) < 0

where lowpart_subreg_regno is a new wrapper around simplify_subreg_regno
that uses subreg_lowpart_offset (mode, prev->mode) as the offset.

Thanks,
Richard

>      prev_rtx = copy_to_reg (prev_rtx);
>
> and then just have the single lowpart_subreg after that.
>
> Thanks,
> Richard

Reply via email to