On Tue, Sep 07, 2021 at 06:07:30PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Sep 07, 2021 at 03:12:36AM -0400, Michael Meissner wrote:
> > [PATCH] Fix SFmode subreg of DImode and TImode
> > 
> > This patch fixes the breakage in the PowerPC due to a recent change in 
> > SUBREG
> > behavior.
> 
> But what was that change?  And was that intentional?  If so, why wasn't
> it documented, was the existing behaviour considered buggy?  But the
> documentation agrees with the previous behaviour afaics.

I haven't investigated what the machine independent change was that suddenly
started allowing:

        (subreg:SF (reg:TI ...))

This patch just allows the subreg's to exist instead of getting an insn not
found message.  An alternative approach would be to allow wider subregs in the
insn matching and deal with getting the right physical register when
splitting.

> > While it is arguable that the patch that caused the breakage should
> > be reverted, this patch should be a bandage to prevent these changes from
> > happening again.
> 
> NAK.  This patch will likely cause us to generate worse code.  If that
> is not the case it will need a long, in-depth explanation of why not.

Trying to replicate the problem, I get exactly the same code with an older
compiler.  I will attach the test program as an attachment.

> Sorry.
> 
> > I first noticed it in building the Spec 2017 wrf_r and blender_r
> > benchmarks.  Once I applied this patch, I also noticed several of the
> > tests now pass.
> > 
> > The core of the problem is we need to treat SUBREG's of SFmode and SImode
> > specially on the PowerPC.  This is due to the fact that SFmode values that 
> > are
> > in the vector and floating point registers are represented as DFmode.  When 
> > we
> > want to do a direct move between the GPR registers and the vector 
> > registers, we
> > have to convert the value from the DFmode representation to/from the SFmode
> > representation.
> 
> The core of the problem is that subreg of pseudos has three meanings:
>   -- Paradoxical subregs;
>   -- Actual subregs;
>   -- "bit_cast" thingies: treat the same bits as something else.  Like
>      looking at the bits of a float as its memory image.
> 
> Ignoring paradoxical subregs (as well as subregs of mem, which should
> have disappeared by now), and subregs of hard registers as well (those
> have *different* semantics after all), the other two kinds can be mixed,
> and *have to* be mixed, because subregs of subregs are non-canonical.

This isn't subregs of subregs.  This is a subreg of a larger integer type, for
example having a subreg representing an __int128_t or structure, and accessing
the bottom/top element.  Instead of doing an extract, the compiler generates a
subreg.

Before the machine independent part of the compiler would not generate these
subregs, and now it does.

> 
> Is there any reason why not to allow this kind of subreg?
> 
> If we want to not allow mixing bit_cast with subregs, we should make it
> its own RTL code.
> 
> > +      /* In case we are given a SUBREG for a larger type, reduce it to
> > +    SImode.  */
> > +      if (mode == SFmode && GET_MODE_SIZE (inner_mode) > 4)
> > +   {
> > +     rtx tmp = gen_reg_rtx (SImode);
> > +     emit_move_insn (tmp, gen_lowpart (SImode, source));
> > +     emit_insn (gen_movsf_from_si (dest, tmp));
> > +     return true;
> > +   }
> 
> This makes it two separate insns.  Is that always optimised to code that
> is at least as good as before?

It is likely even if it generates an extra move, it will be better, because the
default would be to do the transfer via store on one register bank and load of
the other.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
union u_128 {
  __int128_t i128;
  long l[2];
  int i[4];
  double d[2];
  float f[4];
};

union u_64 {
  long l;
  int i[2];
  double d;
  float f[2];
};

float ret_u128_f0 (long a, long b, long x, long y)
{
  union u_128 u;

  u.l[0] = a ^ x;
  u.l[1] = b ^ y;

  return u.f[0];
}

float ret_u128_f1 (long a, long b, long x, long y)
{
  union u_128 u;

  u.l[0] = a ^ x;
  u.l[1] = b ^ y;

  return u.f[1];
}

float ret_u128_f2 (long a, long b, long x, long y)
{
  union u_128 u;

  u.l[0] = a ^ x;
  u.l[1] = b ^ x;

  return u.f[2];
}

float ret_u128_f3 (long a, long b, long x, long y)
{
  union u_128 u;

  u.l[0] = a ^ x;
  u.l[1] = b ^ x;

  return u.f[3];
}

float ret_u64_f0 (long a, long x)
{
  union u_64 u;

  u.l = a ^ x;

  return u.f[0];
}

float ret_u64_f1 (long a, long x)
{
  union u_64 u;

  u.l = a ^ x;

  return u.f[1];
}

Reply via email to