Hello!
Attached patch implements addr32 prefixed addresses for x86_64
targets, where memory locations are accessed with 32bit base and index
registers in the form (zero_extend:DI (... SImode registers ...)).
The optimization rarely (if at all) triggers on x86_64, but is very
important on x32 (see [1]), where many LEAs get moved into addresses
of the operators.
Of some interest is inability of reload to fix-up its own generated
moves for offsetable memory operand constraint "o", as it happens with
TImode moves. See [2] for further analysis and [3] for the workaround.
2011-08-08 Uros Bizjak <[email protected]>
PR target/49781
* config/i386/i386.c (ix86_decompose_address): Allow zero-extended
SImode addresses.
(ix86_print_operand_address): Handle zero-extended addresses.
(memory_address_length): Add length of addr32 prefix for
zero-extended addresses.
(ix86_secondary_reload): Handle moves to/from double-word general
registers from/to zero-extended addresses.
* config/i386/predicates.md (lea_address_operand): Reject
zero-extended operands.
Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu
{,-m32}. Additionally, H.J. tested the patch on x32 target with GCC
bootstrap/regression tests, build of glibc (+regression tests) and
SPEC2000/2006.
Patch was committed to mainline SVN.
BTW: There is a strange optimization in combine pass, where
zero-extended address is converted on-the-fly to:
Trying 9 -> 10:
Failed to match this instruction:
(... (and:DI (subreg:DI (plus:SI (ashift:SI (reg/v:SI 63 [ i ])
(const_int 2 [0x2]))
(subreg:SI (reg/v/f:DI 62 [ a ]) 0)) 0)
(const_int 4294967295 [0xffffffff]))
...)
While it is easy to add a pattern recognizer for this RTX to
ix86_decompose_address/ix86_legitimate_address_p, I would like to
understand the purpose of the conversion better and eventually fix it
in combine pass.
[1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49781
[2] http://gcc.gnu.org/ml/gcc/2011-08/msg00129.html
[3] http://gcc.gnu.org/ml/gcc/2011-08/msg00157.html
Uros.
Index: predicates.md
===================================================================
--- predicates.md (revision 177547)
+++ predicates.md (working copy)
@@ -801,6 +801,10 @@
struct ix86_address parts;
int ok;
+ /* LEA handles zero-extend by itself. */
+ if (GET_CODE (op) == ZERO_EXTEND)
+ return false;
+
ok = ix86_decompose_address (op, &parts);
gcc_assert (ok);
return parts.seg == SEG_DEFAULT;
Index: i386.c
===================================================================
--- i386.c (revision 177547)
+++ i386.c (working copy)
@@ -11142,6 +11142,14 @@ ix86_decompose_address (rtx addr, struct ix86_addr
int retval = 1;
enum ix86_address_seg seg = SEG_DEFAULT;
+ /* Allow zero-extended SImode addresses,
+ they will be emitted with addr32 prefix. */
+ if (TARGET_64BIT
+ && GET_CODE (addr) == ZERO_EXTEND
+ && GET_MODE (addr) == DImode
+ && GET_MODE (XEXP (addr, 0)) == SImode)
+ addr = XEXP (addr, 0);
+
if (REG_P (addr))
base = addr;
else if (GET_CODE (addr) == SUBREG)
@@ -14159,9 +14167,13 @@ ix86_print_operand_address (FILE *file, rtx addr)
}
else
{
- /* Print DImode registers on 64bit targets to avoid addr32 prefixes. */
- int code = TARGET_64BIT ? 'q' : 0;
+ int code = 0;
+ /* Print SImode registers for zero-extended addresses to force
+ addr32 prefix. Otherwise print DImode registers to avoid it. */
+ if (TARGET_64BIT)
+ code = (GET_CODE (addr) == ZERO_EXTEND) ? 'l' : 'q';
+
if (ASSEMBLER_DIALECT == ASM_ATT)
{
if (disp)
@@ -21772,7 +21784,8 @@ assign_386_stack_local (enum machine_mode mode, en
}
/* Calculate the length of the memory address in the instruction
- encoding. Does not include the one-byte modrm, opcode, or prefix. */
+ encoding. Includes addr32 prefix, does not include the one-byte modrm,
+ opcode, or other prefixes. */
int
memory_address_length (rtx addr)
@@ -21799,8 +21812,10 @@ memory_address_length (rtx addr)
base = parts.base;
index = parts.index;
disp = parts.disp;
- len = 0;
+ /* Add length of addr32 prefix. */
+ len = (GET_CODE (addr) == ZERO_EXTEND);
+
/* Rule of thumb:
- esp as the base always wants an index,
- ebp as the base always wants a displacement,
@@ -28233,6 +28248,15 @@ ix86_secondary_reload (bool in_p, rtx x, reg_class
enum machine_mode mode,
secondary_reload_info *sri ATTRIBUTE_UNUSED)
{
+ /* Double-word spills from general registers to non-offsettable memory
+ references (zero-extended addresses) go through XMM register. */
+ if (TARGET_64BIT
+ && MEM_P (x)
+ && GET_MODE_SIZE (mode) > UNITS_PER_WORD
+ && rclass == GENERAL_REGS
+ && !offsettable_memref_p (x))
+ return SSE_REGS;
+
/* QImode spills from non-QI registers require
intermediate register on 32bit targets. */
if (!TARGET_64BIT