https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116275

--- Comment #7 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sa...@gcc.gnu.org>:

https://gcc.gnu.org/g:b6fb4f7f651d2aa89548c5833fe2679af2638df5

commit r15-2940-gb6fb4f7f651d2aa89548c5833fe2679af2638df5
Author: Roger Sayle <ro...@nextmovesoftware.com>
Date:   Thu Aug 15 22:02:05 2024 +0100

    i386: Improve split of *extendv2di2_highpart_stv_noavx512vl.

    This patch follows up on the previous patch to fix PR target/116275 by
    improving the code STV (ultimately) generates for highpart sign extensions
    like (x<<8)>>8.  The arithmetic right shift is able to take advantage of
    the available common subexpressions from the preceding left shift.

    Hence previously with -O2 -m32 -mavx -mno-avx512vl we'd generate:

            vpsllq  $8, %xmm0, %xmm0
            vpsrad  $8, %xmm0, %xmm1
            vpsrlq  $8, %xmm0, %xmm0
            vpblendw        $51, %xmm0, %xmm1, %xmm0

    But with improved splitting, we now generate three instructions:

            vpslld  $8, %xmm1, %xmm0
            vpsrad  $8, %xmm0, %xmm0
            vpblendw        $51, %xmm1, %xmm0, %xmm0

    This patch also implements Uros' suggestion that the pre-reload
    splitter could introduced a new pseudo to hold the intermediate
    to potentially help reload with register allocation, which applies
    when not performing the above optimization, i.e. on TARGET_XOP.

    2024-08-15  Roger Sayle  <ro...@nextmovesoftware.com>
                Uros Bizjak  <ubiz...@gmail.com>

    gcc/ChangeLog
            * config/i386/i386.md (*extendv2di2_highpart_stv_noavx512vl): Split
            to an improved implementation on !TARGET_XOP.  On TARGET_XOP, use
            a new pseudo for the intermediate to simplify register allocation.

    gcc/testsuite/ChangeLog
            * g++.target/i386/pr116275-2.C: New test case.

Reply via email to