Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes: >> On Mon, Jul 19, 2021 at 11:38 PM Richard Sandiford >> <richard.sandif...@arm.com> wrote: >>> >>> "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes: >>> >> > + { >>> >> > + /* First generate subreg of word mode if the previous mode is >>> >> > + wider than word mode and word mode is wider than MODE. */ >>> >> > + prev_rtx = simplify_gen_subreg (word_mode, prev_rtx, >>> >> > + prev_mode, 0); >>> >> > + prev_mode = word_mode; >>> >> > + } >>> >> > + if (prev_rtx != nullptr) >>> >> > + target = simplify_gen_subreg (mode, prev_rtx, prev_mode, 0); >>> >> >>> >> This should be lowpart_subreg, since 0 isn't the right offset for >>> >> big-endian targets. Using lowpart_subreg should also avoid the need >>> >> for the word_size “if” above: lowpart_subreg can handle lowpart subword >>> >> subregs of multiword values. >>> > >>> > I tried it. It didn't work since it caused the LRA failure. I replaced >>> > simplify_gen_subreg with lowpart_subreg instead. >>> >>> What specifically went wrong? >> >> With vector broadcast, for >> --- >> extern void *ops; >> >> void >> foo (int c) >> { >> __builtin_memset (ops, c, 18); >> } >> --- >> we generate HI from V16QI. With a single lowpart_subreg, I get >> >> (insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84) >> (const_int 16 [0x10])) [0 MEM <char[1:18]> [(void >> *)ops.0_1]+16 S2 A8]) >> (subreg:HI (reg:V16QI 51 xmm15) 0)) "s2a.i":6:3 76 {*movhi_internal} >> (nil)) >> >> instead of >> >> (insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84) >> (const_int 16 [0x10])) [0 MEM <char[1:18]> [(void >> *)ops.0_1]+16 S2 A8]) >> (subreg:HI (reg:DI 51 xmm15) 0)) "s2a.i":6:3 76 {*movhi_internal} >> (nil)) >> >> IRA and LRA fail to reload: >> >> (insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84) >> (const_int 16 [0x10])) [0 MEM <char[1:18]> [(void >> *)ops.0_1]+16 S2 A8]) >> (subreg:HI (reg:V16QI 51 xmm15) 0)) "s2a.i":6:3 76 {*movhi_internal} >> (nil)) >> >> since ix86_can_change_mode_class has >> >> if (MAYBE_SSE_CLASS_P (regclass) || MAYBE_MMX_CLASS_P (regclass)) >> { >> /* Vector registers do not support QI or HImode loads. If we don't >> disallow a change to these modes, reload will assume it's ok to >> drop the subreg from (subreg:SI (reg:HI 100) 0). This affects >> the vec_dupv4hi pattern. */ >> if (GET_MODE_SIZE (from) < 4) >> return false; >> } > > Ah! OK. In that case, maybe we should have something like: > > if (REG_P (prev_rtx) > && HARD_REGISTER_P (prev_rtx) > && REG_CAN_CHANGE_MODE_P (REGNO (prev_rtx), prev->mode, mode))
Sorry, make that last line: && lowpart_subreg_regno (REGNO (prev_rtx), prev->mode, mode) < 0 where lowpart_subreg_regno is a new wrapper around simplify_subreg_regno that uses subreg_lowpart_offset (mode, prev->mode) as the offset. Thanks, Richard > prev_rtx = copy_to_reg (prev_rtx); > > and then just have the single lowpart_subreg after that. > > Thanks, > Richard