https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101611
--- Comment #9 from CVS Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Jakub Jelinek <ja...@gcc.gnu.org>: https://gcc.gnu.org/g:88d0f70a326eeb42b479aa537f8a81bf5a199346 commit r12-2557-g88d0f70a326eeb42b479aa537f8a81bf5a199346 Author: Jakub Jelinek <ja...@redhat.com> Date: Wed Jul 28 10:52:51 2021 +0200 i386: Improve AVX2 expansion of vector >> vector DImode arithm. shifts [PR101611] AVX2 introduced vector >> vector shifts, but unfortunately for V{2,4}DImode it only supports logical and not arithmetic shifts, only AVX512F for V8DImode or AVX512VL for V{2,4}DImode fixed that omission. Earlier in GCC12 cycle I've committed vector >> scalar arithmetic shift emulation using various sequences, this patch handles the vector >> vector case. No need to adjust costs, the previous cost adjustment actually covers even the vector by vector shifts. The patch emits the right arithmetic V{2,4}DImode shifts using 2 logical right V{2,4}DImode shifts (once of the original operands, once of sign mask constant by the vector shift count), xor and subtraction, on each element (long long) x >> y is done as (((unsigned long long) x >> y) ^ (0x8000000000000000ULL >> y)) - (0x8000000000000000ULL >> y) i.e. if x doesn't have in some element the MSB set, it is just the logical shift, if it does, then the xor and subtraction cause also all higher bits to be set. 2021-07-28 Jakub Jelinek <ja...@redhat.com> PR target/101611 * config/i386/sse.md (vashr<mode>3): Split into vashrv8di3 expander and vashrv4di3 expander, where the latter requires just TARGET_AVX2 and has special !TARGET_AVX512VL expansion. (vashrv2di3<mask_name>): Rename to ... (vashrv2di3): ... this. Change condition to TARGET_XOP || TARGET_AVX2 and add special !TARGET_XOP && !TARGET_AVX512VL expansion. * gcc.target/i386/avx2-pr101611-1.c: New test. * gcc.target/i386/avx2-pr101611-2.c: New test.