https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856

--- Comment #13 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Looking at what other compilers emit for this, ICC seems to be completely
broken, it emits logical right shifts instead of arithmetic right shift, and
LLVM trunk emits for >> 63 what this patch emits, for >> 17 it emits
        vpsrad  $17, %xmm0, %xmm1
        vpsrlq  $17, %xmm0, %xmm0
        vpblendd        $10, %xmm1, %xmm0, %xmm0
instead of
        vpxor   %xmm1, %xmm1, %xmm1
        vpcmpgtq        %xmm0, %xmm1, %xmm1
        vpsrlq  $17, %xmm0, %xmm0
        vpsllq  $47, %xmm1, %xmm1
        vpor    %xmm1, %xmm0, %xmm0
the patch emits.  For >> 47 it emits:
        vpsrad  $31, %xmm0, %xmm1
        vpsrad  $15, %xmm0, %xmm0
        vpshufd $245, %xmm0, %xmm0
        vpblendd        $10, %xmm1, %xmm0, %xmm0
etc.
So, in summary, for >> 63 with SSE4.2 I think what the patch does looks best,
for >> 63 and SSE2 we can emit psrad $31 instead and permute the odd elements
into even ones (i.e. __builtin_shuffle ((v4si) x >> 31, { 1, 1, 3, 3 })).
For >> cst where cst < 32, do a psrad and psrlq by that cst and permute such
that
we get the even SI elts from the psrlq result and odd from psrad result.
For >> 32, do a psrad $31 and permute to get the even SI elts from odd elts of
the source and odd SI elts from odd results of psrad $31.
For >> cst where cst > 32, do psrad $31 and psrad $(cst-32) and permute
such that even SI elts come from odd elts of the latter and odd elts come from
odd elts of the former.

Reply via email to