Re: [PATCH] Add QI vector mode support to by-pieces for memset

H.J. Lu via Gcc-patches Tue, 20 Jul 2021 05:49:22 -0700

On Mon, Jul 19, 2021 at 11:38 PM Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes:
> >> > +     {
> >> > +       /* First generate subreg of word mode if the previous mode is
> >> > +          wider than word mode and word mode is wider than MODE.  */
> >> > +       prev_rtx = simplify_gen_subreg (word_mode, prev_rtx,
> >> > +                                       prev_mode, 0);
> >> > +       prev_mode = word_mode;
> >> > +     }
> >> > +      if (prev_rtx != nullptr)
> >> > +     target = simplify_gen_subreg (mode, prev_rtx, prev_mode, 0);
> >>
> >> This should be lowpart_subreg, since 0 isn't the right offset for
> >> big-endian targets.  Using lowpart_subreg should also avoid the need
> >> for the word_size “if” above: lowpart_subreg can handle lowpart subword
> >> subregs of multiword values.
> >
> > I tried it.  It didn't work since it caused the LRA failure.   I replaced
> > simplify_gen_subreg with lowpart_subreg instead.
>
> What specifically went wrong?


With vector broadcast, for
---
extern void *ops;

void
foo (int c)
{
  __builtin_memset (ops, c, 18);
}
---
we generate HI from V16QI.   With a single lowpart_subreg, I get

(insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84)
                (const_int 16 [0x10])) [0 MEM <char[1:18]> [(void
*)ops.0_1]+16 S2 A8])
        (subreg:HI (reg:V16QI 51 xmm15) 0)) "s2a.i":6:3 76 {*movhi_internal}
     (nil))

instead of

(insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84)
                (const_int 16 [0x10])) [0 MEM <char[1:18]> [(void
*)ops.0_1]+16 S2 A8])
        (subreg:HI (reg:DI 51 xmm15) 0)) "s2a.i":6:3 76 {*movhi_internal}
     (nil))

IRA and LRA fail to reload:

(insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84)
                (const_int 16 [0x10])) [0 MEM <char[1:18]> [(void
*)ops.0_1]+16 S2 A8])
        (subreg:HI (reg:V16QI 51 xmm15) 0)) "s2a.i":6:3 76 {*movhi_internal}
     (nil))

since ix86_can_change_mode_class has

  if (MAYBE_SSE_CLASS_P (regclass) || MAYBE_MMX_CLASS_P (regclass))
    {
      /* Vector registers do not support QI or HImode loads.  If we don't
         disallow a change to these modes, reload will assume it's ok to
         drop the subreg from (subreg:SI (reg:HI 100) 0).  This affects
         the vec_dupv4hi pattern.  */
      if (GET_MODE_SIZE (from) < 4)
        return false;
    }

If we don't use a hard scratch register, (subreg:HI (reg:V16QI)) compiles.  But
codegen is worse:

vmovd %edi, %xmm0
vpbroadcastb %xmm0, %xmm0
vmovdqa %xmm0, -24(%rsp)
movq ops(%rip), %rax
movzwl -24(%rsp), %edx
vmovdqu %xmm0, (%rax)
movw %dx, 16(%rax)

vs

vmovd %edi, %xmm15
movq ops(%rip), %rax
vpbroadcastb %xmm15, %xmm15
vmovq %xmm15, %rdx
movw %dx, 16(%rax)
vmovdqu %xmm15, (%rax)

> >> > +/* Callback routine for store_by_pieces.  Read GET_MODE_BITSIZE (MODE)
> >> > +   bytes from constant string DATA + OFFSET and return it as target
> >> > +   constant.  If PREV isn't nullptr, it has the RTL info from the
> >> > +   previous iteration.  */
> >> > +
> >> > +rtx
> >> > +builtin_memset_read_str (void *data, void *prev,
> >> > +                      HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
> >> > +                      machine_mode mode)
> >> > +{
> >> > +  rtx target;
> >> >    const char *c = (const char *) data;
> >> > -  char *p = XALLOCAVEC (char, GET_MODE_SIZE (mode));
> >> > +  char *p;
> >> > +  unsigned int size = GET_MODE_SIZE (mode).to_constant ();
> >> > +
> >> > +  /* Don't use the previous value if size is 1.  */
> >>
> >> Why not though?  The existing code does, and that seems like the right
> >> thing to do when operating on integers.
> >>
> >> I can see the check would be a good thing to do if MODE isn't a vector
> >> mode and the previous mode was.  Is that the case you're worried about?
> >> If so, that check could go in gen_memset_value_from_prev instead.
> >
> > We are storing one byte.  Doing it directly is faster.
>
> But the first thing being protected here is…
>
> >> > +  if (size != 1)
> >> > +    {
> >> > +      target = gen_memset_value_from_prev (prev, mode);
> >> > +      if (target != nullptr)
> >> > +     return target;
>
> …this attempt to use the previous value.  If the target uses, say,
> SImode for the first piece and QImode for a final byte, using the QImode
> lowpart of the SImode register would avoid having to move the byte value
> into a separate QImode register.  Why's that a bad thing to do?  AFAICT
> it's what the current code would do, so if we want to change it even for
> integer modes, I think it should be a separate patch with a separate
> justification.

I removed the size == 1 check.   I didn't notice any issues.

> Like I say, I can understand that using the QImode lowpart of a vector
> wouldn't be a good idea.  But if that's specifically what you're trying
> to prevent, I think we should test for it.
>
> Thanks,
> Richard

I will submit the v3 patch.

Thanks.

-- 
H.J.

Re: [PATCH] Add QI vector mode support to by-pieces for memset

Reply via email to