[Bug rtl-optimization/80425] New: Extra inter-unit register move with zero-extension

ubizjak at gmail dot com Thu, 13 Apr 2017 23:42:08 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80425


            Bug ID: 80425
           Summary: Extra inter-unit register move with zero-extension
           Product: gcc
           Version: 7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ubizjak at gmail dot com
  Target Milestone: ---

The testcase is taken from PR80381:

--cut here--
#include <x86intrin.h>

__m512i
f1 (__m512i x, int a)
{
  return _mm512_srai_epi32 (x, a);
}
--cut here--

When compiled with -O2 -mavx512f -mtune=intel, the resulting assembly reads:

f1:
        movl    %edi, %edi      # 8     *zero_extendsidi2/4     [length = 2]
        vmovq   %rdi, %xmm1     # 21    *movdi_internal/20      [length = 6]
        vpsrad  %xmm1, %zmm0, %zmm0     # 13    ashrv16si3/1    [length = 6]
        ret     # 24    simple_return_internal  [length = 1]

(insn 8) and (insn 21) could be merged to

        vmovd   %edx, %xmm0     # 13    *zero_extendsidi2/10    [length = 6]

Register allocator somehow avoids zero-extension to SSE reg in (insn 8) and
generates input reload (insn 21) for (insn 13):

    Inserting insn reload before:
   21: r100:DI=r196:DI
         ...
         Choosing alt 19 in insn 21:  (0) ?*Yi  (1) r {*movdi_internal}

RA could choose the same (?*Yi, r) alternative in the (insn 12).

REE pass also doesn't merge (insn 8) and (insn 21).

[Bug rtl-optimization/80425] New: Extra inter-unit register move with zero-extension

Reply via email to