"Yangfei (Felix)" <felix.y...@huawei.com> writes: > Hi, > >> -----Original Message----- >> From: Richard Sandiford [mailto:richard.sandif...@arm.com] >> Sent: Sunday, May 31, 2020 12:01 AM >> To: Yangfei (Felix) <felix.y...@huawei.com> >> Cc: gcc-patches@gcc.gnu.org; Uros Bizjak <ubiz...@gmail.com>; Jakub >> Jelinek <ja...@redhat.com>; Hongtao Liu <crazy...@gmail.com>; H.J. Lu >> <hjl.to...@gmail.com> >> Subject: Re: [PATCH PR95254] aarch64: gcc generate inefficient code with >> fixed sve vector length >> > > Snip... > >> > >> > The v5 patch attached addressed this issue. >> > >> > There two added changes compared with the v4 patch: >> > 1. In candidate_mem_p, mov_optab for innermode should be available. >> > In this case, mov_optab for SDmode is not there and subreg are added >> back by emit_move_insn_1. So we won't get the benefit with the patch. >> >> I agree we should have this check. I think the rule applies to all of the >> transforms though, not just the mem one, so we should add the check to the >> register and constant cases too. > > OK. I changed to make this an extra condition for calculating x_inner & y > _inner.
Sounds good. Maybe at this point the x_inner and y_inner code is getting complicated enough to put into a lambda too: x_inner = ... (x); y_inner = ... (y); Just a suggestion though. >> > 2. Instead of using adjust_address, I changed to use adjust_address_nv to >> avoid the emit of invalid insn 13. >> > The latter call to validize_mem() in emit_move_insn will take care of >> > the >> address for us. >> >> The validation performed by validize_mem is the same as that performed by >> adjust_address, so the only case this should make a difference is for >> push_operands: > > True. > >> /* If X or Y are memory references, verify that their addresses are valid >> for the machine. */ >> if (MEM_P (x) >> && (! memory_address_addr_space_p (GET_MODE (x), XEXP (x, 0), >> MEM_ADDR_SPACE (x)) >> && ! push_operand (x, GET_MODE (x)))) >> x = validize_mem (x); >> >> if (MEM_P (y) >> && ! memory_address_addr_space_p (GET_MODE (y), XEXP (y, 0), >> MEM_ADDR_SPACE (y))) >> y = validize_mem (y); >> >> So I think the fix is to punt on push_operands instead (and continue to use >> adjust_address rather than adjust_address_nv). > > Not sure if I understand it correctly. > Do you mean excluding push_operand in candidate_mem_p? Like: > > 3830 auto candidate_mem_p = [&](machine_mode innermode, rtx mem) { > 3831 return !targetm.can_change_mode_class (innermode, GET_MODE (mem), > ALL_REGS) > 3832 && !push_operand (mem, GET_MODE (mem)) > 3833 /* Not a candiate if innermode requires too much alignment. > */ > 3834 && (MEM_ALIGN (mem) >= GET_MODE_ALIGNMENT (innermode) > 3835 || targetm.slow_unaligned_access (GET_MODE (mem), > 3836 MEM_ALIGN (mem)) > 3837 || !targetm.slow_unaligned_access (innermode, MEM_ALIGN > (mem))); > 3838 }; Yeah, looks good. Formatting nit though: multi-line conditions should be wrapped in (...), i.e.: return (... && ... && ...); Thanks, Richard