[Bug target/101611] AVX2 vector arithmetic shift lowered to scalar unnecessarily

cvs-commit at gcc dot gnu.org via Gcc-bugs Wed, 28 Jul 2021 01:53:47 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101611


--- Comment #9 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <ja...@gcc.gnu.org>:

https://gcc.gnu.org/g:88d0f70a326eeb42b479aa537f8a81bf5a199346

commit r12-2557-g88d0f70a326eeb42b479aa537f8a81bf5a199346
Author: Jakub Jelinek <ja...@redhat.com>
Date:   Wed Jul 28 10:52:51 2021 +0200

    i386: Improve AVX2 expansion of vector >> vector DImode arithm. shifts
[PR101611]

    AVX2 introduced vector >> vector shifts, but unfortunately for V{2,4}DImode
    it only supports logical and not arithmetic shifts, only AVX512F for
    V8DImode or AVX512VL for V{2,4}DImode fixed that omission.
    Earlier in GCC12 cycle I've committed vector >> scalar arithmetic shift
    emulation using various sequences, this patch handles the vector >> vector
    case.  No need to adjust costs, the previous cost adjustment actually
    covers even the vector by vector shifts.
    The patch emits the right arithmetic V{2,4}DImode shifts using 2 logical
right
    V{2,4}DImode shifts (once of the original operands, once of sign mask
    constant by the vector shift count), xor and subtraction, on each element
    (long long) x >> y is done as
    (((unsigned long long) x >> y) ^ (0x8000000000000000ULL >> y))
    - (0x8000000000000000ULL >> y)
    i.e. if x doesn't have in some element the MSB set, it is just the logical
    shift, if it does, then the xor and subtraction cause also all higher bits
    to be set.

    2021-07-28  Jakub Jelinek  <ja...@redhat.com>

            PR target/101611
            * config/i386/sse.md (vashr<mode>3): Split into vashrv8di3 expander
            and vashrv4di3 expander, where the latter requires just TARGET_AVX2
            and has special !TARGET_AVX512VL expansion.
            (vashrv2di3<mask_name>): Rename to ...
            (vashrv2di3): ... this.  Change condition to TARGET_XOP ||
TARGET_AVX2
            and add special !TARGET_XOP && !TARGET_AVX512VL expansion.

            * gcc.target/i386/avx2-pr101611-1.c: New test.
            * gcc.target/i386/avx2-pr101611-2.c: New test.

[Bug target/101611] AVX2 vector arithmetic shift lowered to scalar unnecessarily

Reply via email to