On Tue, Jul 20, 2021 at 8:12 AM Richard Sandiford <richard.sandif...@arm.com> wrote: > > Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > > "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes: > >> On Mon, Jul 19, 2021 at 11:38 PM Richard Sandiford > >> <richard.sandif...@arm.com> wrote: > >>> > >>> "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes: > >>> >> > + { > >>> >> > + /* First generate subreg of word mode if the previous mode is > >>> >> > + wider than word mode and word mode is wider than MODE. */ > >>> >> > + prev_rtx = simplify_gen_subreg (word_mode, prev_rtx, > >>> >> > + prev_mode, 0); > >>> >> > + prev_mode = word_mode; > >>> >> > + } > >>> >> > + if (prev_rtx != nullptr) > >>> >> > + target = simplify_gen_subreg (mode, prev_rtx, prev_mode, 0); > >>> >> > >>> >> This should be lowpart_subreg, since 0 isn't the right offset for > >>> >> big-endian targets. Using lowpart_subreg should also avoid the need > >>> >> for the word_size “if” above: lowpart_subreg can handle lowpart subword > >>> >> subregs of multiword values. > >>> > > >>> > I tried it. It didn't work since it caused the LRA failure. I > >>> > replaced > >>> > simplify_gen_subreg with lowpart_subreg instead. > >>> > >>> What specifically went wrong? > >> > >> With vector broadcast, for > >> --- > >> extern void *ops; > >> > >> void > >> foo (int c) > >> { > >> __builtin_memset (ops, c, 18); > >> } > >> --- > >> we generate HI from V16QI. With a single lowpart_subreg, I get > >> > >> (insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84) > >> (const_int 16 [0x10])) [0 MEM <char[1:18]> [(void > >> *)ops.0_1]+16 S2 A8]) > >> (subreg:HI (reg:V16QI 51 xmm15) 0)) "s2a.i":6:3 76 > >> {*movhi_internal} > >> (nil)) > >> > >> instead of > >> > >> (insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84) > >> (const_int 16 [0x10])) [0 MEM <char[1:18]> [(void > >> *)ops.0_1]+16 S2 A8]) > >> (subreg:HI (reg:DI 51 xmm15) 0)) "s2a.i":6:3 76 {*movhi_internal} > >> (nil)) > >> > >> IRA and LRA fail to reload: > >> > >> (insn 10 9 0 2 (set (mem:HI (plus:DI (reg/f:DI 84) > >> (const_int 16 [0x10])) [0 MEM <char[1:18]> [(void > >> *)ops.0_1]+16 S2 A8]) > >> (subreg:HI (reg:V16QI 51 xmm15) 0)) "s2a.i":6:3 76 > >> {*movhi_internal} > >> (nil)) > >> > >> since ix86_can_change_mode_class has > >> > >> if (MAYBE_SSE_CLASS_P (regclass) || MAYBE_MMX_CLASS_P (regclass)) > >> { > >> /* Vector registers do not support QI or HImode loads. If we don't > >> disallow a change to these modes, reload will assume it's ok to > >> drop the subreg from (subreg:SI (reg:HI 100) 0). This affects > >> the vec_dupv4hi pattern. */ > >> if (GET_MODE_SIZE (from) < 4) > >> return false; > >> } > > > > Ah! OK. In that case, maybe we should have something like: > > > > if (REG_P (prev_rtx) > > && HARD_REGISTER_P (prev_rtx) > > && REG_CAN_CHANGE_MODE_P (REGNO (prev_rtx), prev->mode, mode)) > > Sorry, make that last line: > > && lowpart_subreg_regno (REGNO (prev_rtx), prev->mode, mode) < 0 > > where lowpart_subreg_regno is a new wrapper around simplify_subreg_regno > that uses subreg_lowpart_offset (mode, prev->mode) as the offset.
Fixed. I submitted the v3 patch: https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575670.html Thanks. > Thanks, > Richard > > > prev_rtx = copy_to_reg (prev_rtx); > > > > and then just have the single lowpart_subreg after that. > > > > Thanks, > > Richard -- H.J.