On Jul 30, 2007, at 11:06 AM, Evan Cheng wrote:
On Jul 30, 2007, at 12:02 AM, Christopher Lamb wrote:
On Jul 29, 2007, at 10:20 PM, Evan Cheng wrote:
On Jul 29, 2007, at 9:37 PM, Christopher Lamb wrote:
On Jul 29, 2007, at 6:20 PM, Evan Cheng wrote:
Sent from my iPhone
On Jul 28, 2007, at 4:36 PM, Christopher Lamb
<[EMAIL PROTECTED]> wrote:
On Jul 28, 2007, at 2:26 PM, Evan Cheng wrote:
On Jul 28, 2007, at 11:52 AM, Christopher Lamb
<[EMAIL PROTECTED]> wrote:
On Jul 28, 2007, at 1:48 AM, Evan Cheng wrote:
Very cool! I need to read it more carefully.
But I see you are lowering zext to a single insert_subreg.
Is that right? It won't zero out the top part, no?
It's only lowering (zext i32 to i64) to an insert_subreg on
x86-64 where all writes to 32-bit registers implicitly zero-
extend into the upper 32-bits.
I know. But thy mismatch semantically. A insert_subreg to the
lower part should not change the upper half. I think this is
only legal for anyext.
On x86-64 the semantics of a 2 operand i32 insert_subreg is
that the input super-value is implicitly zero. So in this
sense the insert isn't changing the upper half, it's just that
the upper half is being set to zero implicitly rather than
explicitly. If you'll notice the insert_subreg is a two
operand (implicit super value) not a three operand version. If
the insert were the three operand version, and the super value
as coming from an implicit def I'd agree with you, but it's not.
Ok, let's step back for a second. There are a couple of issues
that should be addressed. Plz help me understand. :)
1: Semantics of insert_subreg should be the same across all
targets, right?
I'm not certain that this should be so. x86-64 clearly has a
target specific semantics of a 32-bit into 64-bit insert.
No, that won't do. insert_subreg and extract_subreg are by
definition target independent. They must have the same semantics.
You are forcing x86-64 32-bit zero-extending move to fit
insert_subreg when they are really not the same thing.
If target independence is a requirement, then I agree that using
insert_subreg for x86-64 zero-ext isn't currently feasible.
I contend that insert_subreg is target specific already. It currently
requires a target specific subreg index, which is a kind of target
specific hook, that tells coalescing how to deal with it. A two
operand insert_subreg (or an insert_subreg from undef) is a move
between target register classes that have subreg relationship.
insert_subreg defines the entire superreg value, and I don't see why
it's so bad to allow targets to specify their own semantics for what
happens to the register being inserted into? This is essentially what
the subreg index is already, we could even put semantics in the
SubRegSet in the RegisterInfo.td allowing the semantics to be checked
by the compiler.
A parameter of the set that indicates that an insert into the subreg
i either leaves the rest of the superreg value untouched
(insert_subreg reg, reg, i), or it implicitly sets the rest of the
register to a known value (insert_subreg constant_value, reg, i), or
to undef (insert_subreg undef, reg, i). Only the specified semantics
for that SubRegSet of the register class of the result of the
insert_subreg would be valid, and could be ensured so.
This seems to me to allow insert_subreg to capture may useful cases,
and it captures the register set semantics in the RegisterInfo.td
file, where I think it belongs.
2: two operant variant of insert_subreg should mean the
superreg is undef. If you insert a value into a low part, the
rest of the superreg is still undef.
I think the meaning of insert_subreg instruction (both 2 and 3
operand versions) must have semantics specific to the target.
For example, on x86-64 there is no valid 3 operand insert_subreg
for a 32-bit value into 64-bits, because the 32-bit result is
always going to be zero extended and overwrite the upper 32-bits.
It just means there is no way to implement a insert_subreg with a
single instruction under x86-64. But that is perfectly ok. Apart
from anyext, x86-64 just isn't going to benefit from it. It's
also impossible to read or modify the higher 32-bits.
Currently the move that's generated isn't handled by coalescing
because the source and destination belong to different register
classes. The insert_subreg is meant to be a means to move values
implicitly between register classes that have a subreg
relationship. So if insert_subreg semantics must be target
independent, then I think you isel the zero-extending move to be:
(i64 (INSERT_SUBREG (i64 0), GR32:$src, 3))
But that's wrong. Remember the superreg argument is an read / mod /
write operand. That is, the first operand is a use, the def is the
LHS but we are forcing the allocator to target the same physical
register.
v1 = some existing value
v1 = insert_subreg v1, GR32:$src, 3
But zext is zeroing out the top part. i.e. zext is equal to
mov v1, 0
v1 = insert_subreg v1, GR32:$src, 3
I'm suggesting to expand the semantics of insert_subreg as described
above.
The thing is that the general coalescing will be able to determine
that the copy from undef is unneeded for (INSERT_SUBREG (i64
undef), GR32:$src, 3), but it would take a target specific hook to
know that the constant zero is unneeded on x86-64. A target
specific hook for this might be useful, but I think that this is
in the realm of future work now.
Sorry, I am not following. zext on x86-64, i.e. the 32-bit move,
cannot be coalesced away. No need for target specific hook.
I simply disagree here:
http://www.x86-64.org/documentation/assembly.html see the section
'Implicit Zero Extend'
EAX = op
RAX = mov EAX <= this may be removed
... = use RAX
3: why is there a two operant variant in the first place? Why
not use undef for the superreg operant?
To note, the two operand variant is of the MachineInstr. The DAG
form would be to represent the superregister as coming from an
undef node, but this gets isel'd to the two operand MachineInstr
of insert_subreg.
The reason is that undef is typically selected to an implicit
def of a register. This causes an unnecessary move to be
generated later on. This move can be optimized away later with
more difficulty during subreg lowering by checking whether the
input register is defined by an implicit def pseudo instruction,
but instead I decided to perform the optimization during ISel on
the DAG form during instruction selection.
With what you're suggesting
reg1024 = ...
reg1026 = insert_subreg undef, reg1024, 1
reg1027 = insert_subreg reg1026, reg1025, 1
use reg1027
would be isel'd to then subreg lowered to:
R6 = ...
implicit def R01 <= this implicit def is unecessary
That's a pseudo instruction, it doesn't cost anything.
R23 = R01 <= this copy is unnecessary
It can be coalesced to:
R23 = undef
R2 = R6
R45 = R23
R5 = R6
use R45
Using undef explicit is the right way to go. There is a good
reason it's there. Having the two operand version of
insert_subreg that implicitly use an undef value doesn't fit into
the overall llvm philosophy.
Right now the coalescing that you are describing is happening
during isel. Are you simply saying that you'd rather have the
coalescing happen during subreg lowering? I can accept that, but
would you share your reasons?
There really isn't a very good argument for having the 2 different
versions of insert_subreg. undef use must be explicitly modeled. I
really don't see what you mean by coalescing during isel. isel
doesn't have the concept of coalescing. Also don't forget
everything must remain ssa until register allocation.
It depends on what you intend the semantics of insert_subreg to be.
Under my proposal above, there would only be the 3 operand version.
However, there would be a variant where where the input superreg
operand is an immediate, but the immediate value would then be explicit.
4: what's the benefit of isel a zext to insert_subreg and then
xform it to a 32-bit move?
The xform to a 32-bit move is only the conservative behavior.
The zext can be implicit if regalloc can coalesce subreg_inserts.
Why not just isel the zext to the move? It's not legal to
coalesce it away anyway.
Actually it is legal to coalesce it. On x86-64 any write to a 32-
bit register zero extends the value to 64-bits. For the
insert_subreg under discussion the inserted value is a 32-bit
result, that has in-fact already be zero extended implicitly.
It's not legal to coalesce away the 32-bit zero extending move.
Suppose RAX contains some value with top 32-bits non-zero.
mov EAX, EAX (zero extend top bits)
use RAX (expecting top bits to be zero)
Coalesced away the move is a miscompilation.
Indeed, but what you have described is not a valid insert_subreg
either. Insert_subreg would take EAX as its input operand and
would only be coalesced into an instruction that defines EAX
explicitly (i.e. an instruction that defines RAX defines EAX
implicitly, not explicitly so no coalescing). I think that this
coalescing rule is generally required for correctness when
coalescing insert_subreg under any architecture.
What I've been saying all along. zero_extend on x86-64 isn't the
same as a insert_sub, don't try to model it that way.
Your example is (use (zext (i32 (trunc RAX)))), which cannot be done
without an explicit mov instruction. What I was contending is that
(use (zext EAX)) can be done without an explicit mov instruction.
--
Christopher Lamb
_______________________________________________
llvm-commits mailing list
llvm-commits@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits