http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48701
Summary: [missed optimization] GCC fails to use aliasing of ymm and xmm registers Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: kr...@kde.org The two functions in the attached test case demonstrate the problem. The intermediate stores/loads on the stack really should be optimized away. testStore output now: vmovdqa %xmm1,-0x30(%rsp) vmovdqa %xmm0,-0x20(%rsp) vmovdqa -0x30(%rsp),%ymm0 vmovdqa %ymm0,(<blackhole>) should be either: vinsertf128 $1,%xmm0,%ymm1,%ymm0 vmovdqa %ymm0,(<blackhole>) or: vmovdqa %xmm1,(<blackhole>) vmovdqa %xmm0,0x10(<blackhole>) depending on the target microarchitecture and accompanying code. likewise the testLoad output now is: vmovdqa (<blackhole>),%ymm0 vmovdqa %ymm0,-0x30(%rsp) vmovdqa -0x20(%rsp),%xmm1 vmovdqa -0x30(%rsp),%xmm0 and should be either: vmovdqa (<blackhole>),%ymm0 vextractf128 $1,%ymm0,%xmm1 or: vmovdqa (<blackhole>),%xmm0 vmovdqa 0x10(<blackhole>),%xmm1