5.1.0 miss optimisation with vpmovmskb

ubizjak at gmail dot com Wed, 03 Jun 2015 11:01:02 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66369


Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2015-06-03
           Assignee|unassigned at gcc dot gnu.org      |ubizjak at gmail dot com
     Ever confirmed|0                           |1

--- Comment #5 from Uroš Bizjak <ubizjak at gmail dot com> ---
Created attachment 35693
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35693&action=edit
Patch to add zero-extended MOVMSK patterns

This patch adds zero-extended MOVMSK patterns.

However, one more cast from (int) to (unsigned int) is needed in the source,
due to the definition of the intrinsic:

   long v;

   regchx256 = _mm256_set1_epi8( ch );
   regset256 = _mm256_loadu_si256( (__m256i const *) set );
   v = (unsigned int) _mm256_movemask_epi8
                       ( _mm256_cmpeq_epi8(regchx256,regset256) );

Using patched gcc, the code compiles to:

lookup32:
        vmovdqu charset32(%rip), %ymm0  # 10    *avx_loaddquv32qi
        vmovd   %edi, %xmm1     # 54    vec_setv4si_0/4
        movl    $11141307, %eax # 5     *movdi_internal/3
        vpbroadcastb    %xmm1, %ymm1    # 55    avx2_pbroadcastv32qi
        vpcmpeqb        %ymm0, %ymm1, %ymm0     # 13    *avx2_eqv32qi3
        vpmovmskb       %ymm0, %edx     # 16    *avx2_pmovmskb_zext
        testl   %edx, %edx      # 19    *cmpsi_ccno_1/1
        je      .L5     # 20    *jcc_1
        tzcntq  %rdx, %rdx      # 53    *ctzdi2_falsedep
        movq    mytable+32(,%rdx,8), %rax       # 28    *movdi_internal/4
.L5:
        vzeroupper      # 51    avx_vzeroupper
        ret     # 58    simple_return_internal

[Bug target/66369] gcc 4.8.3/5.1.0 miss optimisation with vpmovmskb

Reply via email to