------- Comment #6 from bonzini at gnu dot org 2007-11-06 17:05 ------- I think P1 is a little too much since this requires -fforce-addr.
Anyway, here are my findings and thoughts: 1) reduced testcase: void oc_frag_recon_inter2_mmx(unsigned char *_dst,int _dst_ystride, const unsigned char *_src1,int _src1_ystride,const unsigned char *_src2, int _src2_ystride,const int *_residue) { long a; __asm__ __volatile__( "# %[src2] %[src1_ystride] %[src1] %[a] %[src2_ystride] \n\t" "# %[residue] %[dst_ystride] %[dst]\n\t" :[a]"=&a"(a),[dst]"+r"(_dst),[residue]"+r"(_residue), [src1]"+r"(_src1),[src2]"+r"(_src2) :[dst_ystride]"m"((long)_dst_ystride), [src1_ystride]"m"((long)_src1_ystride), [src2_ystride]"m"((long)_src2_ystride) ); } Note that the testcase is *relying on GCC to undo flag_force_addr!* In fact, the author used a "m" constraint exactly because they knew they would run out of registers: on a less starved machine, one would have used "r"! This makes me wonder if we shouldn't kill -fforce-addr just like we disposed of -fforce-mem. Let's go on anyway and try to fix it. 2) one of the problems is that at -O we do not run fwprop. We probably want to. 3) Here is a hack that I thought would fix it. Index: ../../peak-gcc-src/gcc/stmt.c =================================================================== --- ../../peak-gcc-src/gcc/stmt.c (revision 129768) +++ ../../peak-gcc-src/gcc/stmt.c (working copy) @@ -660,6 +660,7 @@ expand_asm_operands (tree string, tree o const char **constraints = alloca ((noutputs + ninputs) * sizeof (const char *)); int old_generating_concat_p = generating_concat_p; + int save_force_addr = flag_force_addr; /* An ASM with no outputs needs to be treated as volatile, for now. */ if (noutputs == 0) @@ -780,6 +781,7 @@ expand_asm_operands (tree string, tree o /* Second pass evaluates arguments. */ ninout = 0; + flag_force_addr = false; for (i = 0, tail = outputs; tail; tail = TREE_CHAIN (tail), i++) { tree val = TREE_VALUE (tail); @@ -1072,6 +1074,8 @@ expand_asm_operands (tree string, tree o emit_insn (body); } + flag_force_addr = save_force_addr; + /* For any outputs that needed reloading into registers, spill them back to where they belong. */ for (i = 0; i < noutputs; ++i) The idea is that in an asm, the author already has full control of memory modes. If they want to use a simple one, they can (using "r"). Otherwise, they don't want -fforce-addr (assuming somebody wants it...). It doesn't work, actually, because CSE not only does not do the propagation in 4.3; with -fforce-addr, it undoes it!!! So, here are the three possibilities: a) temporarily set flag_force_addr to false during CSE (in addition to the above stmt.c hunks, and enabling fwprop at -O). b) make flag_force_addr effective only during expansion (in addition to the above stmt.c hunks, and enabling fwprop at -O). With tree level optimization, few RTL passes modify mems (CSE, fwprop), and they do so because of addressing mode selection. So none of them should benefit of -fforce-addr, even though only CSE is affected now. c) disable flag_force_addr completely (doesn't require anything else, we might still want to enable fwprop at -O?) My preference is c, b, a. Anybody else? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33713