Hi Jakib, On 07 Apr 16:52, Jakub Jelinek wrote: > Hi! > > This patch is slightly larger, so I haven't included it in the patch I've > sent a few minutes ago. > > I've looked at godbolt for what ICC generates for these and picked sequences > that generate approx. as good code as that. For > min_epi64/max_epi64/min_epu64/max_epu64 there is a slight complication that > in AVX512F there is only _mm512_{min,max}_ep{i,u}64 but not the _mm256_ or > _mm_ ones, so we need to perform 512-bit operations all the time rather than > perform extractions, 256-bit operation, further extractions and then 128-bit > operations. > > Seems we need to teach our permutation code further instructions, e.g. > typedef long long V __attribute__((vector_size (64))); > typedef int W __attribute__((vector_size (64))); > W f0 (W x) { > return __builtin_shuffle (x, (W) { 8, 9, 10, 11, 12, 13, 14, 15, 0, 1, 2, > 3, 4, 5, 6, 7 }); > } > V f1 (V x) { > return __builtin_shuffle (x, (V) { 4, 5, 6, 7, 0, 1, 2, 3 }); > } > generate unnecessarily bad code (could use vpshufi64x2 instruction), > guess that can be resolved for GCC8. > > Tested with > make -j272 -k check-gcc RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} > i386.exp' > on KNL, will bootstrap/regtest on my Haswell-E next, ok for trunk > if that passes? Patch is OK for trunk, thanks for implementing those intrinsics!
-- K