On Tue, Mar 18, 2008 at 09:40:49PM +0000, Richard Sandiford wrote:
> > The most natural layout would be 0x45??0123 .
> > But you could also have 0x345?012? , or even more exotic mappings.
> 
> Do we actually support the second mapping though?  Surely the
> target-independent code needs to know how bytes are divided into words?

I don't see why the target-independent code would need to know what the bits
inside a partial integer mode mean.
A partial exception to this is when aritmetic for partial integers has to
be implemented using arithmetic for integral integers; in this case, it is
assumed that moving partial integers to integral integers, performing the
arithmetic, and moving back to partial integers will produce the right result.
So, if partial integer addition or subtraction is present, and no named
pattern for these operations exits, this implies that valid bits are
contiguous, and that any unused lower bits will read as zero (assuming we
are actually dealing with bits here.  Stranger scenarious are possible,
e.g. mod 81 arithmtic.)

> The reason Kenny's looking at this is that he wants to track which
> bytes in a SUBREG are actually live.

A conservative assumption is that all bits occupied by the integral mode the
partial integral mode is associated with are live.  If we really find that
there is a code quality issue when making this assumption, we can add a hook
to define the salient semantics, but I doubt this will come up.

> >> 3) What about things like 80-bit FP modes on a 32-bit or 64-bit target? Is 
> >> it valid to refer to pieces of an 80-bit FP pseudo? If so, are the rules 
> >> we've got here right?
> >
> > Where the 80-bit mode is stored in multiple words like for x86, you
> > should be able to refer to word_mode subregs the way the value is
> > stored in memory.  This is the only way you can get a sane equivalence
> > between reloads via secondary memory and direct register-register
> > moves invollving word_mode GENERAL_REGS.
> 
> OK, so in all these cases, "N words and a bit" modes can be treated
> like "N + 1 words, with the upper bits undefined"?  For both inner
> and outer modes?

N + 1 words, yes, but it doesn't follow that it must be the upper bits
that are undefined.  If that is actually the case, however, for an 80 bit
value on a little-endian byte-addressed the target, the port could refer
to the bits in the highest words as (subreg:HI (reg:XF inner_reg) 8) or
(subreg:HI (mem:XF mem_addr) 8) to make this explicit.
However, what would we do with a true-blue big endian target?
Would the highest bits be (subreg:HI (reg:XF inner_reg 2)) ?

> >> 4) Do stores to subregs of hardreg invalidate just the registers
> >> mentioned in the outer mode or do they invalidate the entire set of
> >> registers mentioned in the inner mode? (our rules say only the outer
> >> mode).
> >
> > Where the hardreg is actually a single hardware register, all of it is
> > clobbered.  If it is a concatenation of multiple actual hard
> > registers, the idea is that only the one that corresponds to the word
> > that is stored into gets clobbered.  If more than one word is stored
> > into, that would logically translate to changing each of the registers
> > that each word corresponds to.
> >
> > What seems less defined is what happens when the underlying hard registers
> > are smaller than a word, and either the mode size or SUBREG_BYTE
> > is not a multiple of a word.
> 
> Yeah, my version of the question was more: do we support subregs of
> hard registers in which the normal word-based semantics of pseudos
> do not apply?

Having some data registers larger than word size is quite common,
particularily floating point registers on machines with a word size
smaller than the largest supported floating point mode.

IIRC we support this, but not very well.

Where the hardware allows transfers bewteen differently sized registers,
it seems most natural to use SUBREGs to express this.

IIRC you have to do something like (SUBREG:SI (SUBREG:DI (REG:DF...
and even spread it across multiple instruction patterns.
I don't see why we should be picky about the MODE_CLASS of inner or
outer modes of SUBREGs.

If individual portions of multiple-word registers can be accessed individually
like normal registers, it makes sense to mode the individual parts as
separate registers, but it is essential that all parts can be both
read from and writen to separately with moves from/to general purpose
registers to make this work sanely.  Also, group spill allocation
has extra costs in several ways, so if the predominant way to use the
wide registers is to use them as a whole, it is still desirable to
model them as wide registers and have the narrower accesses use
SUBREG and/or zero_extract.

But there is also part of an answer here for the original question:
when a wide register is only partially available as separate words,
it is more likely to be available as separate values to read.
If you can't write separate parts separately, it follows that a subreg
write would naturally clobber the entire register.

There is a problem, though, with considering zero_extract as an escape hatch
if you do want to access only part of the register in sepcial circumstances:
the documentation says that applied on memory, the inner mode must be
byte-sized - this will certainly be violated in reload - and that for
registers, the mode will be that of extv / insv.  Not all processors have
extv / insv instructions, and even if they had, you might need more than
one inner mode in different circumstances.  Why are we making any
stipulations about the inner mode?

> The current documentation expressly forbids taking
> an SImode subreg of a DImode hard register on a 32-bit machine,

Huh?  Then all our 32 bit ports which support long long must be broken.

> for example, and I agree that the subword hard register case is
> also suspicious.

I suppose it just doesn't happen often enough for anybody to have any
strong opinion one way or other.  I suppose you can always express this
with a zero_extract, so it would only become important if we had to
worry about memory footprint of or processing time for zero_extract.

So, pragmatically, I suppose we should go with whatever prohibition or
definition allows the fastest implementation.

> Without wanting to fan flames, isn't this something that should
> be fixed in reload? ;)  Reload is amenable to change...

We've already discussed this 16 months ago:
http://gcc.gnu.org/ml/gcc-patches/2006-11/msg01074.html

FWIW, I did a small reload patch to my experimental local sources yesterday
to tinker with reload types for a 0.2% size gain.

Reply via email to