Re: subreg vs vec_select

Richard Sandiford Fri, 11 Sep 2020 04:15:19 -0700

Ilya Leoshkevich <i...@linux.ibm.com> writes:
> On Fri, 2020-09-11 at 12:17 +0200, Ilya Leoshkevich wrote:
>> On Fri, 2020-09-11 at 10:46 +0100, Richard Sandiford wrote:
>> > Ilya Leoshkevich via Gcc <gcc@gcc.gnu.org> writes:
>> > > On Wed, 2020-09-09 at 16:09 -0500, Segher Boessenkool wrote:
>> > > > Hi Ilya,
>> > > > 
>> > > > On Wed, Sep 09, 2020 at 11:50:56AM +0200, Ilya Leoshkevich via
>> > > > Gcc
>> > > > wrote:
>> > > > > I have a vector pseudo containing a single 128-bit value
>> > > > > (V1TFmode)
>> > > > > and
>> > > > > I need to access its last 64 bits (DFmode). Which of the two
>> > > > > options
>> > > > > is better?
>> > > > > 
>> > > > > (subreg:DF (reg:V1TF) 8)
>> > > > > 
>> > > > > or
>> > > > > 
>> > > > > (vec_select:DF (subreg:V2DF (reg:V1TF) 0) (parallel
>> > > > > [(const_int
>> > > > > 1)]))
>> > > > > 
>> > > > > If I use the first one, I run into a problem with set_noop_p
>> > > > > (): it
>> > > > > thinks that
>> > > > > 
>> > > > > (set (subreg:DF (reg:TF %f0) 8) (subreg:DF (reg:V1TF %f0) 8))
>> > > > > 
>> > > > > is a no-op, because it doesn't check the mode after stripping
>> > > > > the
>> > > > > subreg:
>> > > > > 
>> > > > > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/rtlanal.c;h=5ae38b79#l1616
>> > > > > 
>> > > > > However this is not correct, because SET_DEST is the second
>> > > > > register in
>> > > > > a register pair, and SET_SRC is half of a vector register
>> > > > > that
>> > > > > overlaps
>> > > > > the first register in the corresponding pair. So it looks as
>> > > > > if
>> > > > > mode
>> > > > > needs to be considered there.
>> > > > 
>> > > > Yes.
>> > > > 
>> > > > > This helps:
>> > > > > 
>> > > > > --- a/gcc/rtlanal.c
>> > > > > +++ b/gcc/rtlanal.c
>> > > > > @@ -1619,6 +1619,8 @@ set_noop_p (const_rtx set)
>> > > > >         return 0;
>> > > > >        src = SUBREG_REG (src);
>> > > > >        dst = SUBREG_REG (dst);
>> > > > > +      if (GET_MODE (src) != GET_MODE (dst))
>> > > > > +       return 0;
>> > > > >      }
>> > > > > 
>> > > > > but I'm not sure whether I'm not missing something about
>> > > > > subreg
>> > > > > semantics in the first place.
>> > > > 
>> > > > You probably should just see if both modes are the same number
>> > > > of
>> > > > hard
>> > > > registers?  HARD_REGNO_NREGS.
>> > > 
>> > > I've refined my patch as follows:
>> > > 
>> > > --- a/gcc/rtlanal.c
>> > > +++ b/gcc/rtlanal.c
>> > > @@ -1619,6 +1619,11 @@ set_noop_p (const_rtx set)
>> > >         return 0;
>> > >        src = SUBREG_REG (src);
>> > >        dst = SUBREG_REG (dst);
>> > > +      if (REG_P (src) && HARD_REGISTER_P (src) && REG_P (dst)
>> > > +         && HARD_REGISTER_P (dst)
>> > > +         && hard_regno_nregs (REGNO (src), GET_MODE (src))
>> > > +                != hard_regno_nregs (REGNO (dst), GET_MODE
>> > > (dst)))
>> > > +       return 0;
>> > >      }
>> > 
>> > I think checking the mode would be safer.  Having the same number
>> > of registers doesn't mean that the bits are distributed across the
>> > registers in the same way.
>> 
>> Yeah, that's what I was trying to express with this hypothetical
>> machine example.  On the other hand, checking mode is too
>> pessimistic.
>> E.g. if we talk about s390 GPRs, then considering
>> 
>> (set (subreg:SI (reg:DI %r0) 4) (subreg:SI (reg:DI %r0) 4))
>
> Sorry, bad example: here the hard register modes actually match.
> But it's probably possible to come up with something similar, where
> the hard reg is accessed with different modes, but when we add subregs
> on top, then we end up selecting the same bits.


Wel, in general, (subreg (reg …) …) shouldn't exist for hard registers.
They should just be folded to the underlying (reg …) instead.
So in the above I'd expect to see:

  (set (reg:SI %r0) (reg:SI %r0))

instead.  I remember there were cases on e.g. e500 where the target
prevented the fold from taking place due to layout issues, but wasn't
sure whether the above FP example above was something similar.

So…

>> a no-op is fine from my perspective.  So having a more restrictive
>> check might be desirable.  Is there a way to ask the backend how the
>> subreg bits are distributed?

…if the bits are distributed evenly in an obvious way, the hard register
subreg should already have been folded to a reg.  I think subregs of
hard registers are rare enough that we should just keep it simple
and safe.

>> > Out of interest, why can't the subregs in the example above get
>> > folded down to hard registers?
>> 
>> I think this is because the offsets are not 0.  I could imagine
>> folding
>> (subreg:DF (reg:TF %f0) 8) to (reg:DF %f2) - but there must be a
>> backend hook for this.  Does anything like this exist?  Also, can
>> (subreg:DF (reg:V1TF %f0) 8) be folded at all? This is simply
>> the second doubleword of 128-bit %v0 vector register.

For multi-register values with a regular layout, simplify_subreg_regno
(via subreg_get_info) uses various target queries to work out which
hard registers are actually being accessed.  Is the subreg above
for big-endian or little-endian?  (I hope big-endian given the
8 offset.)

Thanks,
Richard

Re: subreg vs vec_select

Reply via email to