Hello! Attached patch introduces generation of addr32 prefixed addresses, mainly intended to merge ZERO_EXTRACTed LEA calculations into address. After fixing various inconsistencies with "o" constraints, the patch works surprisingly well (in its current form fixes all reported problems in the PR [1]), but one problem remains w.r.t. handling of "o" constraint.
Patched gcc ICEs on gcc.dg/torture/pr47744-2.c with: $ ~/gcc-build-fast/gcc/cc1 -O2 -mx32 -std=gnu99 -quiet pr47744-2.c pr47744-2.c: In function ‘matmul_i16’: pr47744-2.c:40:1: error: insn does not satisfy its constraints: (insn 116 66 67 4 (set (reg:TI 0 ax) (mem:TI (zero_extend:DI (plus:SI (reg:SI 4 si [orig:114 ivtmp.26 ] [114]) (reg:SI 5 di [orig:101 dest_y ] [101]))) [6 MEM[base: dest_y_18, index: ivtmp.26_53, offset: 0B]+0 S16 A128])) pr47744-2.c:34 60 {*movti_internal_rex64} (nil)) pr47744-2.c:40:1: internal compiler error: in reload_cse_simplify_operands, at postreload.c:403 Please submit a full bug report, ... ... due to the fact that the address is not offsetable, and plus ((zero_extend (...)) (const_int ...)) gets rejected from ix86_legitimate_address_p. However, the section "16.8.1 Simple Constraints" of the documentation claims: --quote-- * A nonoffsettable memory reference can be reloaded by copying the address into a register. So if the constraint uses the letter `o', all memory references are taken care of. --/quote-- As I read this sentence, the RTX is forced into a temporary register, and reload tries to satisfy "o" constraint with plus ((reg ...) (const_int ...)), as said at the introduction of "o" constraint a couple of pages earlier. Unfortunately, this does not seem to be the case. Is there anything wrong with my approach, or is there something wrong in reload? 2011-08-05 Uros Bizjak <ubiz...@gmail.com> PR target/49781 * config/i386/i386.c (ix86_decompose_address): Allow zero-extended SImode addresses. (ix86_print_operand_address): Handle zero-extended addresses. (memory_address_length): Add length of addr32 prefix for zero-extended addresses. * config/i386/predicates.md (lea_address_operand): Reject zero-extended operands. Patch is otherwise bootstrapped and tested on x86_64-pc-linux-gnu {,-m32} without regressions. [1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49781 Thanks, Uros.
Index: config/i386/predicates.md =================================================================== --- config/i386/predicates.md (revision 177456) +++ config/i386/predicates.md (working copy) @@ -801,6 +801,10 @@ struct ix86_address parts; int ok; + /* LEA handles zero-extend by itself. */ + if (GET_CODE (op) == ZERO_EXTEND) + return false; + ok = ix86_decompose_address (op, &parts); gcc_assert (ok); return parts.seg == SEG_DEFAULT; Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 177456) +++ config/i386/i386.c (working copy) @@ -11146,6 +11146,14 @@ ix86_decompose_address (rtx addr, struct ix86_addr int retval = 1; enum ix86_address_seg seg = SEG_DEFAULT; + /* Allow zero-extended SImode addresses, + they will be emitted with addr32 prefix. */ + if (TARGET_64BIT + && GET_CODE (addr) == ZERO_EXTEND + && GET_MODE (addr) == DImode + && GET_MODE (XEXP (addr, 0)) == SImode) + addr = XEXP (addr, 0); + if (REG_P (addr)) base = addr; else if (GET_CODE (addr) == SUBREG) @@ -14163,9 +14171,13 @@ ix86_print_operand_address (FILE *file, rtx addr) } else { - /* Print DImode registers on 64bit targets to avoid addr32 prefixes. */ - int code = TARGET_64BIT ? 'q' : 0; + int code = 0; + /* Print SImode registers for zero-extended addresses to force + addr32 prefix. Otherwise print DImode registers to avoid it. */ + if (TARGET_64BIT) + code = (GET_CODE (addr) == ZERO_EXTEND) ? 'l' : 'q'; + if (ASSEMBLER_DIALECT == ASM_ATT) { if (disp) @@ -21776,7 +21788,8 @@ assign_386_stack_local (enum machine_mode mode, en } /* Calculate the length of the memory address in the instruction - encoding. Does not include the one-byte modrm, opcode, or prefix. */ + encoding. Includes addr32 prefix, does not include the one-byte modrm, + opcode, or other prefixes. */ int memory_address_length (rtx addr) @@ -21803,8 +21816,10 @@ memory_address_length (rtx addr) base = parts.base; index = parts.index; disp = parts.disp; - len = 0; + /* Add length of addr32 prefix. */ + len = (GET_CODE (addr) == ZERO_EXTEND); + /* Rule of thumb: - esp as the base always wants an index, - ebp as the base always wants a displacement,