https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104271

--- Comment #9 from cuilili <lili.cui at intel dot com> ---
Really appreciate for your reply, I debugged SRA pass with the small testcase
and found that SRA dose not handle this situation.

SRA cannot split callee's first parameter for "Do not decompose non-BLKmode
parameters in a way that would create a BLKmode parameter. Especially for
pass-by-reference (hence, pointer type parameters), it's not worth it."

Before inline:
For caller 
store-1 :   128 bits store of struct "a" (it is an implicit store during IPA
pass, the store can only be found after a certain pass.)
For callee
load-1 :    128 bits load of struct "a" for operation "c->a=(*a)"
store-2:    128 bits store of struct "c->a" for operation "c->a=(*a)" 
load-2 :    4 * 32 bits load for c->a.f1, c->a.f2, c->a.f3 and c->a.f4.
(because the store-2 using vector register to store, we cannot use the register
directly here.) 

After inline:
For caller
None.
For callee
store-2 :  128 bits store of struct c->a for operation "c->a=(*a)"

--------------------------------------------------------
int callee (struct A *a, struct C *c)
{
  c->a=(*a);   
  if ((c->b + 7) & 17)
    {
      c->a.f1 = c->a.f2 + c->a.f3;
      c->a.f2 = c->a.f2 - c->a.f3;
      c->a.f3 = c->a.f2 + c->a.f3;
      c->a.f4 = c->a.f2 - c->a.f3;
      c->b = c->a.f2 + c->a.f4;
      return 0;
    }
  return 1;
}

int caller (int d, struct C *c)
{
  struct A a;
  a.f1 = 1 + d;
  a.f2 = 2;
  a.f3 = 12 + d;
  a.f4 = 68 + d;
  if (d > 0)
    return callee (&a, c);
  else
    return 1;
}
-------------------------------------------------
In 538.imagic_r(c_ray also has the similar code), if we inline the hot
function, the redundant store and load structure's size is 256 bits (4 elements
of size 64 bits), which can eliminates one 256-bit store, one 256-bit load, and
four 64-bit loads.
can I do it like this? Computes the total size of all callee arguments that can
eliminate redundant loads and stores. Thanks!

Reply via email to