https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112992

--- Comment #11 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sa...@gcc.gnu.org>:

https://gcc.gnu.org/g:79649a5dcd81bc05c0ba591068c9075de43bd417

commit r15-222-g79649a5dcd81bc05c0ba591068c9075de43bd417
Author: Roger Sayle <ro...@nextmovesoftware.com>
Date:   Tue May 7 07:14:40 2024 +0100

    PR target/106060: Improved SSE vector constant materialization on x86.

    This patch resolves PR target/106060 by providing efficient methods for
    materializing/synthesizing special "vector" constants on x86.  Currently
    there are three methods of materializing a vector constant; the most
    general is to load a vector from the constant pool, secondly "duplicated"
    constants can be synthesized by moving an integer between units and
    broadcasting (of shuffling it), and finally the special cases of the
    all-zeros vector and all-ones vectors can be loaded via a single SSE
    instruction.   This patch handle additional cases that can be synthesized
    in two instructions, loading an all-ones vector followed by another SSE
    instruction.  Following my recent patch for PR target/112992, there's
    conveniently a single place in i386-expand.cc where these special cases
    can be handled.

    Two examples are given in the original bugzilla PR for 106060.

    __m256i should_be_cmpeq_abs ()
    {
      return _mm256_set1_epi8 (1);
    }

    is now generated (with -O3 -march=x86-64-v3) as:

            vpcmpeqd        %ymm0, %ymm0, %ymm0
            vpabsb  %ymm0, %ymm0
            ret

    and

    __m256i should_be_cmpeq_add ()
    {
      return _mm256_set1_epi8 (-2);
    }

    is now generated as:

            vpcmpeqd        %ymm0, %ymm0, %ymm0
            vpaddb  %ymm0, %ymm0, %ymm0
            ret

    2024-05-07  Roger Sayle  <ro...@nextmovesoftware.com>
                Hongtao Liu  <hongtao....@intel.com>

    gcc/ChangeLog
            PR target/106060
            * config/i386/i386-expand.cc (enum ix86_vec_bcast_alg): New.
            (struct ix86_vec_bcast_map_simode_t): New type for table below.
            (ix86_vec_bcast_map_simode): Table of SImode constants that may
            be efficiently synthesized by a ix86_vec_bcast_alg method.
            (ix86_vec_bcast_map_simode_cmp): New comparator for bsearch.
            (ix86_vector_duplicate_simode_const): Efficiently synthesize
            V4SImode and V8SImode constants that duplicate special constants.
            (ix86_vector_duplicate_value): Attempt to synthesize "special"
            vector constants using ix86_vector_duplicate_simode_const.
            * config/i386/i386.cc (ix86_rtx_costs) <case ABS>: ABS of a
            vector integer mode costs with a single SSE instruction.

    gcc/testsuite/ChangeLog
            PR target/106060
            * gcc.target/i386/auto-init-8.c: Update test case.
            * gcc.target/i386/avx512fp16-13.c: Likewise.
            * gcc.target/i386/pr100865-9a.c: Likewise.
            * gcc.target/i386/pr101796-1.c: Likewise.
            * gcc.target/i386/pr106060-1.c: New test case.
            * gcc.target/i386/pr106060-2.c: Likewise.
            * gcc.target/i386/pr106060-3.c: Likewise.
            * gcc.target/i386/pr70314.c: Update test case.
            * gcc.target/i386/vect-shiftv4qi.c: Likewise.
            * gcc.target/i386/vect-shiftv8qi.c: Likewise.

Reply via email to