On Jul 29, 2007, at 6:20 PM, Evan Cheng wrote:
Sent from my iPhone
On Jul 28, 2007, at 4:36 PM, Christopher Lamb
<[EMAIL PROTECTED]> wrote:
On Jul 28, 2007, at 2:26 PM, Evan Cheng wrote:
On Jul 28, 2007, at 11:52 AM, Christopher Lamb
<[EMAIL PROTECTED]> wrote:
On Jul 28, 2007, at 1:48 AM, Evan Cheng wrote:
Very cool! I need to read it more carefully.
But I see you are lowering zext to a single insert_subreg. Is
that right? It won't zero out the top part, no?
It's only lowering (zext i32 to i64) to an insert_subreg on
x86-64 where all writes to 32-bit registers implicitly zero-
extend into the upper 32-bits.
I know. But thy mismatch semantically. A insert_subreg to the
lower part should not change the upper half. I think this is only
legal for anyext.
On x86-64 the semantics of a 2 operand i32 insert_subreg is that
the input super-value is implicitly zero. So in this sense the
insert isn't changing the upper half, it's just that the upper
half is being set to zero implicitly rather than explicitly. If
you'll notice the insert_subreg is a two operand (implicit super
value) not a three operand version. If the insert were the three
operand version, and the super value as coming from an implicit
def I'd agree with you, but it's not.
Ok, let's step back for a second. There are a couple of issues that
should be addressed. Plz help me understand. :)
1: Semantics of insert_subreg should be the same across all
targets, right?
I'm not certain that this should be so. x86-64 clearly has a target
specific semantics of a 32-bit into 64-bit insert.
2: two operant variant of insert_subreg should mean the superreg is
undef. If you insert a value into a low part, the rest of the
superreg is still undef.
I think the meaning of insert_subreg instruction (both 2 and 3
operand versions) must have semantics specific to the target. For
example, on x86-64 there is no valid 3 operand insert_subreg for a 32-
bit value into 64-bits, because the 32-bit result is always going to
be zero extended and overwrite the upper 32-bits.
3: why is there a two operant variant in the first place? Why not
use undef for the superreg operant?
To note, the two operand variant is of the MachineInstr. The DAG form
would be to represent the superregister as coming from an undef node,
but this gets isel'd to the two operand MachineInstr of insert_subreg.
The reason is that undef is typically selected to an implicit def of
a register. This causes an unnecessary move to be generated later on.
This move can be optimized away later with more difficulty during
subreg lowering by checking whether the input register is defined by
an implicit def pseudo instruction, but instead I decided to perform
the optimization during ISel on the DAG form during instruction
selection.
With what you're suggesting
reg1024 = ...
reg1026 = insert_subreg undef, reg1024, 1
reg1027 = insert_subreg reg1026, reg1025, 1
use reg1027
would be isel'd to then subreg lowered to:
R6 = ...
implicit def R01 <= this implicit def is unecessary
R23 = R01 <= this copy is unnecessary
R2 = R6
R45 = R23
R5 = R6
use R45
4: what's the benefit of isel a zext to insert_subreg and then
xform it to a 32-bit move?
The xform to a 32-bit move is only the conservative behavior. The
zext can be implicit if regalloc can coalesce subreg_inserts.
Why not just isel the zext to the move? It's not legal to coalesce
it away anyway.
Actually it is legal to coalesce it. On x86-64 any write to a 32-bit
register zero extends the value to 64-bits. For the insert_subreg
under discussion the inserted value is a 32-bit result, that has in-
fact already be zero extended implicitly.
Also the current behavior is to use a 32-bit mov instruction for
both zeroext and for anyext, I don't see how this is any different.
--
Chris
Sent from my iPhone
On Jul 28, 2007, at 12:17 AM, Christopher Lamb
<[EMAIL PROTECTED]> wrote:
This patch changes the X86 back end to use the new subreg
operations for appropriate truncate and extend operations.
This should allow regression testing of the subreg feature
going forward, as it's now used in a public target.
The patch passed DejaGnu and all of SingleSource on my x86
machine, but there are changes for x86-64 as well which I
haven't been able to test. Output assembly for x86-64 appears
sane, but I'd appreciate someone giving the patch a try on
their x86-64 system. Other 32-bit x86 testing is also
appreciated.
Thanks
--
Christopher Lamb
<x86_subregs.patch>
_______________________________________________
llvm-commits mailing list
llvm-commits@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
_______________________________________________
llvm-commits mailing list
llvm-commits@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
--
Christopher Lamb
_______________________________________________
llvm-commits mailing list
llvm-commits@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
_______________________________________________
llvm-commits mailing list
llvm-commits@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
--
Christopher Lamb
_______________________________________________
llvm-commits mailing list
llvm-commits@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
_______________________________________________
llvm-commits mailing list
llvm-commits@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
--
Christopher Lamb
_______________________________________________
llvm-commits mailing list
llvm-commits@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits