------- Comment #6 from bonzini at gnu dot org  2007-11-06 17:05 -------
I think P1 is a little too much since this requires -fforce-addr.

Anyway, here are my findings and thoughts:

1) reduced testcase:

void oc_frag_recon_inter2_mmx(unsigned char *_dst,int _dst_ystride,
 const unsigned char *_src1,int _src1_ystride,const unsigned char *_src2,
 int _src2_ystride,const int *_residue)
{
  long a;
  __asm__ __volatile__(
   "# %[src2] %[src1_ystride] %[src1] %[a] %[src2_ystride] \n\t"
   "# %[residue] %[dst_ystride] %[dst]\n\t"
   :[a]"=&a"(a),[dst]"+r"(_dst),[residue]"+r"(_residue),
    [src1]"+r"(_src1),[src2]"+r"(_src2)
   :[dst_ystride]"m"((long)_dst_ystride),
    [src1_ystride]"m"((long)_src1_ystride),
    [src2_ystride]"m"((long)_src2_ystride)
 );
}

Note that the testcase is *relying on GCC to undo flag_force_addr!*  In fact,
the author used a "m" constraint exactly because they knew they would run out
of registers: on a less starved machine, one would have used "r"!

This makes me wonder if we shouldn't kill -fforce-addr just like we disposed of
-fforce-mem.  Let's go on anyway and try to fix it.

2) one of the problems is that at -O we do not run fwprop.  We probably want
to.

3) Here is a hack that I thought would fix it.

Index: ../../peak-gcc-src/gcc/stmt.c
===================================================================
--- ../../peak-gcc-src/gcc/stmt.c       (revision 129768)
+++ ../../peak-gcc-src/gcc/stmt.c       (working copy)
@@ -660,6 +660,7 @@ expand_asm_operands (tree string, tree o
   const char **constraints
     = alloca ((noutputs + ninputs) * sizeof (const char *));
   int old_generating_concat_p = generating_concat_p;
+  int save_force_addr = flag_force_addr;

   /* An ASM with no outputs needs to be treated as volatile, for now.  */
   if (noutputs == 0)
@@ -780,6 +781,7 @@ expand_asm_operands (tree string, tree o
   /* Second pass evaluates arguments.  */

   ninout = 0;
+  flag_force_addr = false;
   for (i = 0, tail = outputs; tail; tail = TREE_CHAIN (tail), i++)
     {
       tree val = TREE_VALUE (tail);
@@ -1072,6 +1074,8 @@ expand_asm_operands (tree string, tree o
       emit_insn (body);
     }

+  flag_force_addr = save_force_addr;
+
   /* For any outputs that needed reloading into registers, spill them
      back to where they belong.  */
   for (i = 0; i < noutputs; ++i)

The idea is that in an asm, the author already has full control of memory
modes. If they want to use a simple one, they can (using "r").  Otherwise, they
don't want -fforce-addr (assuming somebody wants it...).

It doesn't work, actually, because CSE not only does not do the propagation in
4.3; with -fforce-addr, it undoes it!!!  So, here are the three possibilities:

a) temporarily set flag_force_addr to false during CSE (in addition to the
above stmt.c hunks, and enabling fwprop at -O).

b) make flag_force_addr effective only during expansion (in addition to the
above stmt.c hunks, and enabling fwprop at -O).  With tree level optimization,
few RTL passes modify mems (CSE, fwprop), and they do so because of addressing
mode selection.  So none of them should benefit of -fforce-addr, even though
only CSE is affected now.

c) disable flag_force_addr completely (doesn't require anything else, we might
still want to enable fwprop at -O?)

My preference is c, b, a.  Anybody else?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33713

Reply via email to