https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104271
--- Comment #9 from cuilili <lili.cui at intel dot com> --- Really appreciate for your reply, I debugged SRA pass with the small testcase and found that SRA dose not handle this situation. SRA cannot split callee's first parameter for "Do not decompose non-BLKmode parameters in a way that would create a BLKmode parameter. Especially for pass-by-reference (hence, pointer type parameters), it's not worth it." Before inline: For caller store-1 : 128 bits store of struct "a" (it is an implicit store during IPA pass, the store can only be found after a certain pass.) For callee load-1 : 128 bits load of struct "a" for operation "c->a=(*a)" store-2: 128 bits store of struct "c->a" for operation "c->a=(*a)" load-2 : 4 * 32 bits load for c->a.f1, c->a.f2, c->a.f3 and c->a.f4. (because the store-2 using vector register to store, we cannot use the register directly here.) After inline: For caller None. For callee store-2 : 128 bits store of struct c->a for operation "c->a=(*a)" -------------------------------------------------------- int callee (struct A *a, struct C *c) { c->a=(*a); if ((c->b + 7) & 17) { c->a.f1 = c->a.f2 + c->a.f3; c->a.f2 = c->a.f2 - c->a.f3; c->a.f3 = c->a.f2 + c->a.f3; c->a.f4 = c->a.f2 - c->a.f3; c->b = c->a.f2 + c->a.f4; return 0; } return 1; } int caller (int d, struct C *c) { struct A a; a.f1 = 1 + d; a.f2 = 2; a.f3 = 12 + d; a.f4 = 68 + d; if (d > 0) return callee (&a, c); else return 1; } ------------------------------------------------- In 538.imagic_r(c_ray also has the similar code), if we inline the hot function, the redundant store and load structure's size is 256 bits (4 elements of size 64 bits), which can eliminates one 256-bit store, one 256-bit load, and four 64-bit loads. can I do it like this? Computes the total size of all callee arguments that can eliminate redundant loads and stores. Thanks!