On Mon, May 10, 2021 at 2:39 AM Richard Sandiford <richard.sandif...@arm.com> wrote: > > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > > On Fri, Apr 30, 2021 at 8:30 PM Richard Sandiford via Gcc-patches > > <gcc-patches@gcc.gnu.org> wrote: > >> > >> "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes: > >> > On Fri, Apr 30, 2021 at 5:49 AM H.J. Lu <hjl.to...@gmail.com> wrote: > >> >> > >> >> On Fri, Apr 30, 2021 at 5:42 AM Richard Sandiford > >> >> <richard.sandif...@arm.com> wrote: > >> >> > > >> >> > "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes: > >> >> > > On Fri, Apr 30, 2021 at 2:06 AM Richard Sandiford > >> >> > > <richard.sandif...@arm.com> wrote: > >> >> > >> > >> >> > >> "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes: > >> >> > >> > gen_reg_rtx tracks stack alignment needed for pseudo registers > >> >> > >> > so that > >> >> > >> > associated hard registers can be properly spilled onto stack. > >> >> > >> > But there > >> >> > >> > are cases where associated hard registers will never be spilled > >> >> > >> > onto > >> >> > >> > stack. gen_reg_rtx is changed to take an argument for register > >> >> > >> > alignment > >> >> > >> > so that stack realignment can be avoided when not needed. > >> >> > >> > >> >> > >> How is it guaranteed that they will never be spilled though? > >> >> > >> I don't think that that guarantee exists for any kind of pseudo, > >> >> > >> except perhaps for the temporary pseudos that the RA creates to > >> >> > >> replace (match_scratch …)es. > >> >> > >> > >> >> > > > >> >> > > The caller of creating pseudo registers with specific alignment must > >> >> > > guarantee that they will never be spilled. I am only using it in > >> >> > > > >> >> > > /* Make operand1 a register if it isn't already. */ > >> >> > > if (can_create_pseudo_p () > >> >> > > && !register_operand (op0, mode) > >> >> > > && !register_operand (op1, mode)) > >> >> > > { > >> >> > > /* NB: Don't increase stack alignment requirement when forcing > >> >> > > operand1 into a pseudo register to copy data from one > >> >> > > memory > >> >> > > location to another since it doesn't require a spill. */ > >> >> > > emit_move_insn (op0, > >> >> > > force_reg (GET_MODE (op0), op1, > >> >> > > (UNITS_PER_WORD * BITS_PER_UNIT))); > >> >> > > return; > >> >> > > } > >> >> > > > >> >> > > for vector moves. RA shouldn't spill it. > >> >> > > >> >> > But this is the point: it's a case of hoping that the RA won't spill > >> >> > it, > >> >> > rather than having a guarantee that it won't. > >> >> > > >> >> > Even if the moves start out adjacent, they could be separated by later > >> >> > RTL optimisations, particularly scheduling. (I realise pre-RA > >> >> > scheduling > >> >> > isn't enabled by default for x86, but it can still be enabled > >> >> > explicitly.) > >> >> > Or if the same data is being copied to two locations, we might reuse > >> >> > values loaded by the first copy for the second copy as well. > >> > > >> > There are cases where pseudo vector registers are created as pure > >> > temporary registers in the backend and they shouldn't ever be spilled > >> > to stack. They will be spilled to stack only if there are other > >> > non-temporary > >> > vector register usage in which case stack will be properly re-aligned. > >> > Caller of creating pseudo registers with specific alignment guarantees > >> > that they are used only as pure temporary registers. > >> > >> I don't think there's really a distinct category of pure temporary > >> registers though. The things I mentioned above can happen for any > >> kind of pseudo register. > > > > I wonder if for the cases HJ thinks of it is appropriate to use hardregs? > > Do we generally handle those well? That is, are they again subject > > to be allocated by RA when no longer live? > > Yeah, using hard registers should work. Of course, any given fixed choice > of hard register has the potential to be suboptimal in some situation, > but it should be safe.
I tried hard registers. The generated code isn't as good as pseudo registers. But I want to avoid align the shack when YMM registers are only used to inline memcpy/memset. Any suggestions? Thanks. -- H.J.