On Mon, Jul 19, 2021 at 11:38 PM Richard Sandiford <richard.sandif...@arm.com> wrote: > > "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes: > >> > + { > >> > + /* First generate subreg of word mode if the previous mode is > >> > + wider than word mode and word mode is wider than MODE. */ > >> > + prev_rtx = simplify_gen_subreg (word_mode, prev_rtx, > >> > + prev_mode, 0); > >> > + prev_mode = word_mode; > >> > + } > >> > + if (prev_rtx != nullptr) > >> > + target = simplify_gen_subreg (mode, prev_rtx, prev_mode, 0); > >> > >> This should be lowpart_subreg, since 0 isn't the right offset for > >> big-endian targets. Using lowpart_subreg should also avoid the need > >> for the word_size “if” above: lowpart_subreg can handle lowpart subword > >> subregs of multiword values. > > > > I tried it. It didn't work since it caused the LRA failure. I replaced > > simplify_gen_subreg with lowpart_subreg instead. > > What specifically went wrong?
With vector broadcast, for --- extern void *ops; void foo (int c) { __builtin_memset (ops, c, 18); } --- we generate HI from V16QI. With a single lowpart_subreg, I get (insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84) (const_int 16 [0x10])) [0 MEM <char[1:18]> [(void *)ops.0_1]+16 S2 A8]) (subreg:HI (reg:V16QI 51 xmm15) 0)) "s2a.i":6:3 76 {*movhi_internal} (nil)) instead of (insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84) (const_int 16 [0x10])) [0 MEM <char[1:18]> [(void *)ops.0_1]+16 S2 A8]) (subreg:HI (reg:DI 51 xmm15) 0)) "s2a.i":6:3 76 {*movhi_internal} (nil)) IRA and LRA fail to reload: (insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84) (const_int 16 [0x10])) [0 MEM <char[1:18]> [(void *)ops.0_1]+16 S2 A8]) (subreg:HI (reg:V16QI 51 xmm15) 0)) "s2a.i":6:3 76 {*movhi_internal} (nil)) since ix86_can_change_mode_class has if (MAYBE_SSE_CLASS_P (regclass) || MAYBE_MMX_CLASS_P (regclass)) { /* Vector registers do not support QI or HImode loads. If we don't disallow a change to these modes, reload will assume it's ok to drop the subreg from (subreg:SI (reg:HI 100) 0). This affects the vec_dupv4hi pattern. */ if (GET_MODE_SIZE (from) < 4) return false; } If we don't use a hard scratch register, (subreg:HI (reg:V16QI)) compiles. But codegen is worse: vmovd %edi, %xmm0 vpbroadcastb %xmm0, %xmm0 vmovdqa %xmm0, -24(%rsp) movq ops(%rip), %rax movzwl -24(%rsp), %edx vmovdqu %xmm0, (%rax) movw %dx, 16(%rax) vs vmovd %edi, %xmm15 movq ops(%rip), %rax vpbroadcastb %xmm15, %xmm15 vmovq %xmm15, %rdx movw %dx, 16(%rax) vmovdqu %xmm15, (%rax) > >> > +/* Callback routine for store_by_pieces. Read GET_MODE_BITSIZE (MODE) > >> > + bytes from constant string DATA + OFFSET and return it as target > >> > + constant. If PREV isn't nullptr, it has the RTL info from the > >> > + previous iteration. */ > >> > + > >> > +rtx > >> > +builtin_memset_read_str (void *data, void *prev, > >> > + HOST_WIDE_INT offset ATTRIBUTE_UNUSED, > >> > + machine_mode mode) > >> > +{ > >> > + rtx target; > >> > const char *c = (const char *) data; > >> > - char *p = XALLOCAVEC (char, GET_MODE_SIZE (mode)); > >> > + char *p; > >> > + unsigned int size = GET_MODE_SIZE (mode).to_constant (); > >> > + > >> > + /* Don't use the previous value if size is 1. */ > >> > >> Why not though? The existing code does, and that seems like the right > >> thing to do when operating on integers. > >> > >> I can see the check would be a good thing to do if MODE isn't a vector > >> mode and the previous mode was. Is that the case you're worried about? > >> If so, that check could go in gen_memset_value_from_prev instead. > > > > We are storing one byte. Doing it directly is faster. > > But the first thing being protected here is… > > >> > + if (size != 1) > >> > + { > >> > + target = gen_memset_value_from_prev (prev, mode); > >> > + if (target != nullptr) > >> > + return target; > > …this attempt to use the previous value. If the target uses, say, > SImode for the first piece and QImode for a final byte, using the QImode > lowpart of the SImode register would avoid having to move the byte value > into a separate QImode register. Why's that a bad thing to do? AFAICT > it's what the current code would do, so if we want to change it even for > integer modes, I think it should be a separate patch with a separate > justification. I removed the size == 1 check. I didn't notice any issues. > Like I say, I can understand that using the QImode lowpart of a vector > wouldn't be a good idea. But if that's specifically what you're trying > to prevent, I think we should test for it. > > Thanks, > Richard I will submit the v3 patch. Thanks. -- H.J.