[Bug target/77287] Much worse code generated compared to clang (stack alignment and spills)

crazylht at gmail dot com via Gcc-bugs Tue, 24 Aug 2021 20:30:19 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77287


--- Comment #8 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #7)
> (In reply to Andrew Pinski from comment #5)
> > clang can now produce:
> >         mov     eax, dword ptr [esp + 16]
> >         mov     ecx, dword ptr [esp + 28]
> >         vmovdqu xmm0, xmmword ptr [ecx + 32]
> >         vmovdqu xmm1, xmmword ptr [eax]
> >         vpackuswb       xmm2, xmm1, xmm0
> >         vpsubw  xmm0, xmm1, xmm0
> >         vpaddw  xmm0, xmm0, xmm2
> >         vpackuswb       xmm0, xmm0, xmm0
> >         vpackuswb       xmm0, xmm0, xmm0
> >         vpextrd eax, xmm0, 1
> >         ret
> > 
> > I suspect if the back-end is able to "fold" at the gimple level the builtins
> > into gimple, GCC will do a much better job.
> > Currently we have stuff like:
> > _27 = __builtin_ia32_vextractf128_si256 (_28, 0);
> > _26 = __builtin_ia32_vec_ext_v4si (_27, 1); [tail call]
> > 
> > I think both are just a BIT_FIELD_REF really and even more can be simplified
> > to just one bitfield extraction rather than what we do now:
> >         vpackuswb       %ymm1, %ymm0, %ymm0
> >         vpextrd $1, %xmm0, %eax
> > 
> > Plus it looks like with __builtin_ia32_vextractf128_si256 (_28, 0), clang is
> > able to remove half of the code due to only needing 128 bytes stuff :).
> 
> Yes, let's me try this.


Do we have IR for unsigned/signed saturation in gimple level?

[Bug target/77287] Much worse code generated compared to clang (stack alignment and spills)

Reply via email to