Bug ID: 93613 Summary: Missed optimization with _mm256_permute2x128_si256 intrinsic Product: gcc Version: 9.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot Reporter: jakub at gcc dot CC: andysem at mail dot ru Depends on: 93594 Target Milestone: --- +++ This bug was initially created as a clone of Bug #93594 +++ #include <x86intrin.h> __m256i foo (__m128i x) { return _mm256_permute2x128_si256 (_mm256_castsi128_si256 (x), _mm256_castsi128_si256 (x), 0x80); } __m256i bar (__m128i x) { return _mm256_permute2x128_si256 (_mm256_setzero_si256 (), _mm256_castsi128_si256 (x), 0x02); } __m256i baz (__m128i x) { return _mm256_permute2x128_si256 (_mm256_castsi128_si256 (x), _mm256_setzero_si256 (), 0x20); } __m256i qux (__m128i x) { return _mm256_permute2x128_si256 (_mm256_set_epi64x (1, 2, 3, 4), _mm256_set_epi64x (5, 6, 7, 8), 0x80); } __m256i corge (__m128i x) { return _mm256_permute2x128_si256 (_mm256_set_epi64x (1, 2, 3, 4), _mm256_set_epi64x (5, 6, 7, 8), 0x02); } __m256i quux (__m128i x) { return _mm256_permute2x128_si256 (_mm256_set_epi64x (1, 2, 3, 4), _mm256_set_epi64x (5, 6, 7, 8), 0x20); } The _mm256_permute2x128_si256 issues are similar, but really unrelated and IMHO should be tracked in a separate PR. The problem there is that the pattern we use doesn't really describe what the instruction does, uses an UNSPEC_VPERMTI, which obviously can't be simplified by the generic code. The reason is mainly that the instruction isn't just a two source permutation, but essentially 3 source permutation, with the third source of 0. Referenced Bugs: [Bug 93594] Missed optimization with _mm256_set/setr_m128i intrinsics