[Bug tree-optimization/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

hubicka at kam dot mff.cuni.cz via Gcc-bugs Thu, 28 Oct 2021 06:09:36 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908


--- Comment #10 from hubicka at kam dot mff.cuni.cz ---
>        |     b = 2.0 * ray.dir.x * (ray.orig.x - sph->pos.x) +                
> #
>        |       movupd   (%rdi),%xmm5                                          
> #
>        |     2.0 * ray.dir.y * (ray.orig.y - sph->pos.y) +                    
> #
>        |     2.0 * ray.dir.z * (ray.orig.z - sph->pos.z);                     
> #
>   0.02 |       movsd    0x10(%rdi),%xmm9                                      
> #
>   0.01 |       movupd   0xb8(%rsp),%xmm13                                     
> #
>  37.67 |       movupd   0xa0(%rsp),%xmm15                                
> 
> so we pass struct ray on the stack(?) and perform SSE loads from it but
> the argument passing does
> 
>   0.88 |       movups %xmm2,(%rsp)                                            
> #
>   0.22 |       movups %xmm3,0x10(%rsp)                                        
> #
>  43.81 |       movups %xmm4,0x20(%rsp)                                        
> #
>   0.66 |       call   ray_sphere                   

Adding Martin to CC.  I think we could teach ipa-sra to, with -flto,
turn the structure either to scalar arguments or to be passed by
reference which would allow us to hoist its initialization out of the
loop body.

Honza

[Bug tree-optimization/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

Reply via email to