Re: [trunk] Addition to subreg section of rtl.text.

Richard Sandiford Thu, 20 Mar 2008 03:40:16 -0700

Sorry for snipping a lot, but I think the important bit was...

Joern Rennecke <[EMAIL PROTECTED]> writes:
> But the SUBREGS and ZERO_EXTRACTs should still mean the same with respect to
> selecting groups of bits.  You simply don't know which of them mean anything
> and what their positional value is, if any, but you shouldn't need to.
> So in that respect, it still behaves "as if" the natural byte order applies.


OK, so I think you're saying (here and elsewhere) that partial modes
behave "as if" their widths were rounded up to the next word boundary,
but that an unspecified collection of bits in the extended width will
read as undefined?  Thus just as:

  (set (subreg:HI (reg:QI ...) ...) (const_int 0))

does not guarantee that (subreg:HI (reg:QI ...) ...) has the value 0,
you're saying that, for any valid values of M and X:

  (set (subreg:M (reg:N ...) X) (const_int 0))

does not guarantee that (subreg:M (reg:N ...) ...) has the value 0
if N is a partial mode?

Sounds OK to me FWIW.  But we should spell this out in the
documentation.

>> Yes, MIPS is one such port, but we expressly forbid conversions
>> between full-width and partial-width modes for the very reason
>> given in mainline rtl.texi:
>> 
>>     It is also not valid to access a single word of a multi-word value in a
>>     hard register when less registers can hold the value than would be
>>     expected from its size.  For example, some 32-bit machines have
>>     floating-point registers that can hold an entire @code{DFmode} value.
>>     If register 10 were such a register @code{(subreg:SI (reg:DF 10) 4)}
>>     would be invalid because there is no way to convert that reference to
>>     a single machine register.  The reload pass prevents @code{subreg}
>>     expressions such as these from being formed.
>
> The reasoning there is flawed.  You could still identify a specific hard
> register when you are presented with a DFmode subreg of a DCmode or V2DFmode
> inner register.
> And @code{(subreg:SI (reg:DF 10) 0)} would be a natural way to express that
> you are using the floating point register as a 32 bit integer register,
> with writes clobbering the entire 64 bit of the register.

Yes, this is one possible definition.  But there's no reason in this
situation why you couldn't just use a single REG.  Why use subregs at all?

I thought in the earlier post, you were suggesting that it should be
OK to represent a doubleword register that has individually-addressable
words as a single register if most accesses were of the doubleword variety.
I thought you were then saying that you could use (subreg:SI (reg:DF ...) ...)
to refer to the individually-addressable parts.  In that scenario you
_wouldn't_ want (subreg:SI (reg:DF 10) 0) to clobber the whole register.

Which brings us back to Richard K's point about phasing out hard subregs
completely.

>> > IIRC you have to do something like (SUBREG:SI (SUBREG:DI (REG:DF...
>> > and even spread it across multiple instruction patterns.
>> > I don't see why we should be picky about the MODE_CLASS of inner or
>> > outer modes of SUBREGs.
>> 
>> My understanding was that nested subregs aren't allowed (any more).
>
> That's why I taked about spreading it across multiple instruction patterns.
> Unfortunately that can leave you with multiple machine instructions
> where one would do, just because the middle-end is in denial that these
> things might exist.

It just seems to me that, by the time you get to the stage of having
multiple instructions for a single write, you've lost any advantage
you've gained by avoiding unspecs.

Do any mainline ports do this, or are you talking about private ports?

>> > registers to make this work sanely.  Also, group spill allocation
>> > has extra costs in several ways, so if the predominant way to use the
>> > wide registers is to use them as a whole, it is still desirable to
>> > model them as wide registers and have the narrower accesses use
>> > SUBREG and/or zero_extract.
>> 
>> As above, I think the rtl.texi documentation makes this invalid (and this
>> is a long-standing restriction).
>
> Well, Kenny also asked 'are the rules we've got here right?'.

I think this was more a case of "do the rules we wrote reflect
the current situation correctly".  We weren't wanting to _change_
the semantics, but simply asking whether we'd pinned down the
current semantics.

> I think some of the rules are overly restrictive, and prevent gcc
> from archiving its full potential for generating efficient code.
> Moreover, if a port has an extv / insv pattern that matches in mode with the
> wide registers, it can legitimately use the zero_extract route.  It's
> reload that contradicts the documentation in changing registers into MEMs
> and thus creating zero_extracts from wide MEMs.

It sounds like you might be referring to both the subreg and extract
documentation here.  As far as the subreg documentation goes,
let's assume that what I said above about partial modes is right
(you'll have already corrected me by now if not).  If we change the
rules to say that, what do you think is still overly restrictive?
A specific edit to our rtl.texi proposal would probaby be helpful
at this stage.

E.g. one possibility would be to drop:

    If @var{reg} is a hard register, the @code{subreg} must also represent
    the lowpart of a particular hard register, or represent one or more
    complete hard registers.

and instead say that the word-based semantics for pseudo registers also
apply to hard registers, regardless of the number of hard registers in
the inner register.  This would in some ways be simpler.

>> > There is a problem, though, with considering zero_extract as an escape 
>> > hatch
>> > if you do want to access only part of the register in sepcial 
>> > circumstances:
>> > the documentation says that applied on memory, the inner mode must be
>> > byte-sized - this will certainly be violated in reload - and that for
>> > registers, the mode will be that of extv / insv.  Not all processors have
>> > extv / insv instructions, and even if they had, you might need more than
>> > one inner mode in different circumstances.  Why are we making any
>> > stipulations about the inner mode?
>> 
>> I think one reason is that allowing zero_extracts of multi-word modes is
>> (like this subreg thing) a little hard to pin down.  What happens when
>> WORDS_BIG_ENDIAN && !BYTES_BIG_ENDIAN on a 32-bit target, and you have:
>> 
>>     (zero_extract (reg:DI ....) (const_int 16) (const_int 24))
>> 
>> (which should be BITS_BIG_ENDIAN-neutral).  0x76543210 would be laid out
>> in memory as "0x45670123", so is this extract equivalent to "0x70" or
>> "0x43"?  You could probably make a case for both, and I doubt the
>> target-independent code handles this consistently at the moment.

Apologies once again for messing up the hex numbers.  Read them as
a string of bytes rather than a string of nybbles.

(FWIW, it started out as an example that involved lsb-based byte
indices rather than hex numbers, but when sending the example privately
to Kenny, I converted it to hex and switched to a 16-bit target.
Unfortunately, I forgot one half of the conversion when reproducing
the example here...)

> Huh?  The documentation says that zero_extract follows BITS_BIG_ENDIAN,
> so the memory layout doesn't come into play.  We have a 64 bit value,
> and BITS_BIG_ENDIAN determines which bits are meant.

So you're saying that, if the above REG:DI were replaced by a MEM:DI,
the zero_extract would represent a non-contigous bitrange?
(Yes, the documentation suggests byte_mode for MEMs, but the SH port
uses zero_extracts of SImode MEMs as well, so presumably we're supposed
to support other modes besides the documented ones.)

>> (In other words, I was supporting the change in behaviour but
>> opposing the addition of a new hook.)
>
> Which would leave the SH out in the cold.  Not being an SH maintainer
> anymore, I can live with that.

It would leave SH out in the cold because you can no longer deal with
the reverse-endian ordering of the FPRs?  Like I said in my original
reply, I think a hook in the matching logic is not the right place to
expose that.  Supporting reverse-ordered registers in this place only,
and leaving the rest of rtl optimisers with the false belief that the
registers have the natural ordering, doesn't seem like a clean design.
We should either expose the endianness difference properly or continue
to prevent the formation of word accesses.

(FWIW, MIPS used to have the same issue with paired HI/LO values, and
had to prevent all word <=> superword mode changes on little-endian
targets.  It's now been fixed to use the natural ordering for all
targets, thanks to Nigel Stephens.)

Richard

Re: [trunk] Addition to subreg section of rtl.text.

Reply via email to