On Wed, Sep 11, 2024 at 01:22:30PM +0200, Ilya Leoshkevich wrote:
> On Wed, 2024-09-11 at 12:35 +0200, Stefan Schulze Frielinghaus wrote:
> > On Wed, Sep 11, 2024 at 11:47:54AM +0200, Ilya Leoshkevich wrote:
> > > On Fri, 2024-08-16 at 09:41 +0200, Stefan Schulze Frielinghaus
> > > wrote:
> > > > Currently subregs originating from *tf_to_fprx2_0 and
> > > > *tf_to_fprx2_1
> > > > survive register allocation.  This in turn leads to wrong
> > > > register
> > > > renaming.  Keeping the current approach would mean we need two
> > > > insns
> > > > for
> > > > *tf_to_fprx2_0 and *tf_to_fprx2_1, respectively.  Something along
> > > > the
> > > > lines
> > > > 
> > > > (define_insn "*tf_to_fprx2_0"
> > > >   [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand"
> > > > "=f") 0)
> > > >         (unspec:DF [(match_operand:TF 1 "general_operand" "v")]
> > > >                    UNSPEC_TF_TO_FPRX2_0))]
> > > >   "TARGET_VXE"
> > > >   "#")
> > > > 
> > > > (define_insn "*tf_to_fprx2_0"
> > > >   [(set (match_operand:DF 0 "nonimmediate_operand" "=f")
> > > >         (unspec:DF [(match_operand:TF 1 "general_operand" "v")]
> > > >                    UNSPEC_TF_TO_FPRX2_0))]
> > > >   "TARGET_VXE"
> > > >   "vpdi\t%v0,%v1,%v0,1
> > > >   [(set_attr "op_type" "VRR")])
> > > > 
> > > > and similar for *tf_to_fprx2_1.  Note, pre register allocation
> > > > operand 0
> > > > has mode FPRX2 and afterwards DF once subregs have been
> > > > eliminated.
> > > > 
> > > > Since we always copy a whole vector register into a floating-
> > > > point
> > > > register pair, another way to fix this is to merge *tf_to_fprx2_0
> > > > and
> > > > *tf_to_fprx2_1 into a single insn which means we don't have to
> > > > use
> > > > subregs at all.  The downside of this is that the assembler
> > > > template
> > > > contains two instructions, now.  The upside is that we don't have
> > > > to
> > > > come up with some artificial insn before RA which might be more
> > > > readable/maintainable.  That is implemented by this patch.
> > > > 
> > > > In commit r11-4872-ge627cda5686592, the output operand specifier
> > > > %V
> > > > was
> > > > introduced which is used in tf_to_fprx2 only, now.  I didn't come
> > > > up
> > > > with its counterpart like %F for floating-point registers. 
> > > > Instead I
> > > > printed the register pair in the output function directly.  This
> > > > spares
> > > > us a new and "rare" format specifier for a single insn.  I don't
> > > > have
> > > > a
> > > > strong opinion which option to choose, however, we should either
> > > > add
> > > > %F
> > > > in order to mimic the same behaviour as %V or getting rid of %V
> > > > and
> > > > inline the logic in the output function.  I lean towards the
> > > > latter.
> > > > Any preferences?
> > > > ---
> > > >  gcc/config/s390/s390.md                    |  2 +
> > > >  gcc/config/s390/vector.md                  | 66 +++++++++++-----
> > > > ----
> > > > --
> > > >  gcc/testsuite/gcc.target/s390/pr115860-1.c | 26 +++++++++
> > > >  3 files changed, 60 insertions(+), 34 deletions(-)
> > > >  create mode 100644 gcc/testsuite/gcc.target/s390/pr115860-1.c
> > > 
> > > [...]
> > > 
> > > > +  char buf[64];
> > > > +  switch (which_alternative)
> > > > +    {
> > > > +    case 0:
> > > > +      if (REGNO (operands[0]) == REGNO (operands[1]))
> > > > +       return "vpdi\t%V0,%v1,%V0,5";
> > > > +      else
> > > > +       return "ldr\t%f0,%f1;vpdi\t%V0,%v1,%V0,5";
> > > > +    case 1:
> > > > +      {
> > > > +       const char *reg_pair = reg_names[REGNO (operands[0]) +
> > > > 1];
> > > > +       snprintf (buf, sizeof (buf),
> > > > "ld\t%%f0,%%1;ld\t%%%s,8+%%1",
> > > > reg_pair);
> > > 
> > > I wonder if there is a corner case where 8+ does not fit into short
> > > displacement?
> > 
> > That is covered by constraint AR, i.e., for short displacement, and
> > AT
> > for long displacement.
> 
> Don't they cover only %1, and not 8+%1? Can't there be a situation
> where %1 barely fits and 8+%1 doesn't fit? A quick glance shows that
> the code doesn't leave any allowance for this:
> 
> "AR"
>   s390_mem_constraint("AR")
>     s390_check_qrst_address('R')
>       s390_short_displacement()
>         INTVAL (disp) >= 0 && INTVAL (disp) < 4096

Isn't this covered by

int
s390_mem_constraint (const char *str, rtx op)
{
  char c = str[0];

  switch (c)
    {
    case 'A':
      /* Check for offsettable variants of memory constraints.  */
      if (!MEM_P (op) || MEM_VOLATILE_P (op))
        return 0;
      if ((reload_completed || reload_in_progress)
          ? !offsettable_memref_p (op) : !offsettable_nonstrict_memref_p (op))
        return 0;

where

/* Return true if OP is a memory reference whose address contains
   no side effects and remains valid after the addition of a positive
   integer less than the size of the object being referenced.

   We assume that the original address is valid and do not check it.

   This uses strict_memory_address_p as a subroutine, so
   don't use it before reload.  */

bool
offsettable_memref_p (rtx op)
{
  return ((MEM_P (op))
          && offsettable_address_addr_space_p (1, GET_MODE (op), XEXP (op, 0),
                                               MEM_ADDR_SPACE (op)));
}

guarantees that whatever we add to the address such that we stay inside the
object, the address is valid.  Or do I miss something?

Reply via email to