https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93613
            Bug ID: 93613
           Summary: Missed optimization with _mm256_permute2x128_si256
                    intrinsic
           Product: gcc
           Version: 9.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jakub at gcc dot gnu.org
                CC: andysem at mail dot ru
        Depends on: 93594
  Target Milestone: ---

+++ This bug was initially created as a clone of Bug #93594 +++

#include <x86intrin.h>

__m256i
foo (__m128i x)
{
  return _mm256_permute2x128_si256 (_mm256_castsi128_si256 (x),
_mm256_castsi128_si256 (x), 0x80);
}

__m256i
bar (__m128i x)
{
  return _mm256_permute2x128_si256 (_mm256_setzero_si256 (),
_mm256_castsi128_si256 (x), 0x02);
}

__m256i
baz (__m128i x)
{
  return _mm256_permute2x128_si256 (_mm256_castsi128_si256 (x),
_mm256_setzero_si256 (), 0x20);
}

__m256i
qux (__m128i x)
{
  return _mm256_permute2x128_si256 (_mm256_set_epi64x (1, 2, 3, 4),
_mm256_set_epi64x (5, 6, 7, 8), 0x80);
}

__m256i
corge (__m128i x)
{
  return _mm256_permute2x128_si256 (_mm256_set_epi64x (1, 2, 3, 4),
_mm256_set_epi64x (5, 6, 7, 8), 0x02);
}

__m256i
quux (__m128i x)
{
  return _mm256_permute2x128_si256 (_mm256_set_epi64x (1, 2, 3, 4),
_mm256_set_epi64x (5, 6, 7, 8), 0x20);
}

The _mm256_permute2x128_si256 issues are similar, but really unrelated and IMHO
should be tracked in a separate PR.  The problem there is that the pattern we
use doesn't really describe what the instruction does, uses an UNSPEC_VPERMTI,
which obviously can't be simplified by the generic code.  The reason is mainly
that the instruction isn't just a two source permutation, but essentially 3
source permutation, with the third source of 0.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93594
[Bug 93594] Missed optimization with _mm256_set/setr_m128i intrinsics

Reply via email to