On 05/20/2014 04:56 AM, Richard Biener wrote:
On Tue, May 20, 2014 at 6:29 AM, Jan Hubicka <hubi...@ucw.cz> wrote:
On 05/18/2014 08:45 PM, Sandra Loosemore wrote:
On 05/18/2014 02:59 PM, Jan Hubicka wrote:
For cases like local-statics-7 your approach can be "saved" by adding
simple IPA analysis
to look for static vars that are used only by one function and keeping
your DSE code active
for them, so we can still get rid of this special case assignments
during late compilation.
I am however not quite convinced it is worth the effort - do you have
some real world
cases where it helps?

Um, the well-known benchmark.  ;-)

I looked at the generated code for this benchmark and see that your
patch is indeed not getting rid of all the pointless static
variables, while ours is, so this is quite disappointing.  I'm
thinking now that we will still need to retain our patch at least
locally, although we'd really like to get it on trunk if possible.

Yours patch can really be improved by adding simple IPA analysis to work
out what variables are refernced by one function only instead of using
not-longer-that-relevant local static info.
As last IPA pass for each static variable with no address taken, look at all
references and see if they all come from one function or functions inlined to
it.

Here is another testcase vaguely based on the same kind of
signal-processing algorithm as the benchmark, but coded without
reference to it.  I have arm-none-eabi builds around for both 4.9.0
with our remove_local_statics patch applied, and trunk with your
patch.  With -O2, our optimization produces 23 instructions and gets
rid of all 3 statics, yours produces 33 and only gets rid of 1 of
them.

Of course it's lame to use static variables in this way, but OTOH
it's lame if GCC can't recognize them and optimize them away, too.

-Sandra


void
fir (int *coeffs, int coeff_length, int *input, int input_length, int *output)
{
   static int *coeffp;
   static int *inputp;
   static int *outputp;

Here inputp read is rmeoved only at 101.dceloop1 pass. That is really late.
coeffp is removed late, too.

   int i, c, acc;

   for (i = 0; i < input_length; i++)
     {
       coeffp = coeffs;
       inputp = input + i;
       outputp = output + i;
       acc = 0;
       for (c = 0; c < coeff_length; c++)
       {
         acc += *coeffp * *inputp;
         coeffp++;
         inputp--;
       }

It seems like AA can not work out the fact that inputp is unchanged until that
late.  Richi, any ideas?

Well, AA is perfectly fine - it's just that this boils down to a partial
redundancy problem.  The stores can be removed by DCE even with

Index: gcc/tree-ssa-dce.c
===================================================================
--- gcc/tree-ssa-dce.c  (revision 210635)
+++ gcc/tree-ssa-dce.c  (working copy)
@@ -278,10 +278,20 @@ mark_stmt_if_obviously_necessary (gimple
        break;

      case GIMPLE_ASSIGN:
-      if (TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME
-         && TREE_CLOBBER_P (gimple_assign_rhs1 (stmt)))
-       return;
-      break;
+      {
+       tree lhs = gimple_assign_lhs (stmt);
+       if (TREE_CODE (lhs) == SSA_NAME
+           && TREE_CLOBBER_P (gimple_assign_rhs1 (stmt)))
+         return;
+       if (TREE_CODE (lhs) == VAR_DECL
+           && !TREE_ADDRESSABLE (lhs)
+           && TREE_STATIC (lhs)
+           && !TREE_PUBLIC (lhs) && !DECL_EXTERNAL (lhs)
+           /* ???  Make sure lhs is only refered to from cfun->decl.  */
+           && decl_function_context (lhs) == cfun->decl)
+         return;
+       break;
+      }

      default:
        break;

but I don't have a good way of checking the ??? prerequesite
(even without taking its address the statics may be refered to
by a) inline copies, b) ipa-split function parts, c) nested functions).
I'm sure IPA references are not kept up-to-date.

The last reads go away with PRE at the expense of two
additional induction variables.

If we know that a static variable does not have its address taken
and is only refered to from a single function we can in theory
simply rewrite it into SSA form during early opts (before inlining
its body), for example by SRA.  That isn't even restricted to
function-local statics (for statics used in multiple functions but
not having address-taken we can do the same with doing
function entry / exit load / store).  I think that would be a much
better IPA enabled optimization than playing games with
looking at individual stmts.

FAOD, I'm not currently planning to work on this any more myself. I understand Bernd's patch pretty well and could make some minor changes to clean it up if that's what it takes to get it approved, but it sounds like what's wanted is a completely different approach, again, and I don't have any confidence that I could implement something that wouldn't just get stuck in another round of "why don't you try X instead" or "maybe it would be better to do this in Y". :-(

Richard, FWIW I think both of the objections you raised to Bernd's patch last year have been resolved now.

https://gcc.gnu.org/ml/gcc-patches/2013-06/msg00322.html

We know that an IPA-only implementation (Jan's patch) isn't sufficient to catch the important cases, and the Bernd previously answered the question about inlining here:

https://gcc.gnu.org/ml/gcc-patches/2013-06/msg00327.html

-Sandra

Reply via email to