https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77287
--- Comment #8 from Hongtao.liu <crazylht at gmail dot com> --- (In reply to Hongtao.liu from comment #7) > (In reply to Andrew Pinski from comment #5) > > clang can now produce: > > mov eax, dword ptr [esp + 16] > > mov ecx, dword ptr [esp + 28] > > vmovdqu xmm0, xmmword ptr [ecx + 32] > > vmovdqu xmm1, xmmword ptr [eax] > > vpackuswb xmm2, xmm1, xmm0 > > vpsubw xmm0, xmm1, xmm0 > > vpaddw xmm0, xmm0, xmm2 > > vpackuswb xmm0, xmm0, xmm0 > > vpackuswb xmm0, xmm0, xmm0 > > vpextrd eax, xmm0, 1 > > ret > > > > I suspect if the back-end is able to "fold" at the gimple level the builtins > > into gimple, GCC will do a much better job. > > Currently we have stuff like: > > _27 = __builtin_ia32_vextractf128_si256 (_28, 0); > > _26 = __builtin_ia32_vec_ext_v4si (_27, 1); [tail call] > > > > I think both are just a BIT_FIELD_REF really and even more can be simplified > > to just one bitfield extraction rather than what we do now: > > vpackuswb %ymm1, %ymm0, %ymm0 > > vpextrd $1, %xmm0, %eax > > > > Plus it looks like with __builtin_ia32_vextractf128_si256 (_28, 0), clang is > > able to remove half of the code due to only needing 128 bytes stuff :). > > Yes, let's me try this. Do we have IR for unsigned/signed saturation in gimple level?