On Wed, Sep 11, 2024 at 11:47:54AM +0200, Ilya Leoshkevich wrote:
> On Fri, 2024-08-16 at 09:41 +0200, Stefan Schulze Frielinghaus wrote:
> > Currently subregs originating from *tf_to_fprx2_0 and *tf_to_fprx2_1
> > survive register allocation.  This in turn leads to wrong register
> > renaming.  Keeping the current approach would mean we need two insns
> > for
> > *tf_to_fprx2_0 and *tf_to_fprx2_1, respectively.  Something along the
> > lines
> > 
> > (define_insn "*tf_to_fprx2_0"
> >   [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand"
> > "=f") 0)
> >         (unspec:DF [(match_operand:TF 1 "general_operand" "v")]
> >                    UNSPEC_TF_TO_FPRX2_0))]
> >   "TARGET_VXE"
> >   "#")
> > 
> > (define_insn "*tf_to_fprx2_0"
> >   [(set (match_operand:DF 0 "nonimmediate_operand" "=f")
> >         (unspec:DF [(match_operand:TF 1 "general_operand" "v")]
> >                    UNSPEC_TF_TO_FPRX2_0))]
> >   "TARGET_VXE"
> >   "vpdi\t%v0,%v1,%v0,1
> >   [(set_attr "op_type" "VRR")])
> > 
> > and similar for *tf_to_fprx2_1.  Note, pre register allocation
> > operand 0
> > has mode FPRX2 and afterwards DF once subregs have been eliminated.
> > 
> > Since we always copy a whole vector register into a floating-point
> > register pair, another way to fix this is to merge *tf_to_fprx2_0 and
> > *tf_to_fprx2_1 into a single insn which means we don't have to use
> > subregs at all.  The downside of this is that the assembler template
> > contains two instructions, now.  The upside is that we don't have to
> > come up with some artificial insn before RA which might be more
> > readable/maintainable.  That is implemented by this patch.
> > 
> > In commit r11-4872-ge627cda5686592, the output operand specifier %V
> > was
> > introduced which is used in tf_to_fprx2 only, now.  I didn't come up
> > with its counterpart like %F for floating-point registers.  Instead I
> > printed the register pair in the output function directly.  This
> > spares
> > us a new and "rare" format specifier for a single insn.  I don't have
> > a
> > strong opinion which option to choose, however, we should either add
> > %F
> > in order to mimic the same behaviour as %V or getting rid of %V and
> > inline the logic in the output function.  I lean towards the latter.
> > Any preferences?
> > ---
> >  gcc/config/s390/s390.md                    |  2 +
> >  gcc/config/s390/vector.md                  | 66 +++++++++++---------
> > --
> >  gcc/testsuite/gcc.target/s390/pr115860-1.c | 26 +++++++++
> >  3 files changed, 60 insertions(+), 34 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/s390/pr115860-1.c
> 
> [...]
> 
> > +  char buf[64];
> > +  switch (which_alternative)
> > +    {
> > +    case 0:
> > +      if (REGNO (operands[0]) == REGNO (operands[1]))
> > +   return "vpdi\t%V0,%v1,%V0,5";
> > +      else
> > +   return "ldr\t%f0,%f1;vpdi\t%V0,%v1,%V0,5";
> > +    case 1:
> > +      {
> > +   const char *reg_pair = reg_names[REGNO (operands[0]) + 1];
> > +   snprintf (buf, sizeof (buf), "ld\t%%f0,%%1;ld\t%%%s,8+%%1",
> > reg_pair);
> 
> I wonder if there is a corner case where 8+ does not fit into short
> displacement?

That is covered by constraint AR, i.e., for short displacement, and AT
for long displacement.

Reply via email to