https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #10 from hubicka at kam dot mff.cuni.cz --- > | b = 2.0 * ray.dir.x * (ray.orig.x - sph->pos.x) + > # > | movupd (%rdi),%xmm5 > # > | 2.0 * ray.dir.y * (ray.orig.y - sph->pos.y) + > # > | 2.0 * ray.dir.z * (ray.orig.z - sph->pos.z); > # > 0.02 | movsd 0x10(%rdi),%xmm9 > # > 0.01 | movupd 0xb8(%rsp),%xmm13 > # > 37.67 | movupd 0xa0(%rsp),%xmm15 > > so we pass struct ray on the stack(?) and perform SSE loads from it but > the argument passing does > > 0.88 | movups %xmm2,(%rsp) > # > 0.22 | movups %xmm3,0x10(%rsp) > # > 43.81 | movups %xmm4,0x20(%rsp) > # > 0.66 | call ray_sphere Adding Martin to CC. I think we could teach ipa-sra to, with -flto, turn the structure either to scalar arguments or to be passed by reference which would allow us to hoist its initialization out of the loop body. Honza