Hi Jakib,
On 07 Apr 16:52, Jakub Jelinek wrote:
> Hi!
>
> This patch is slightly larger, so I haven't included it in the patch I've
> sent a few minutes ago.
>
> I've looked at godbolt for what ICC generates for these and picked sequences
> that generate approx. as good code as that.  For
> min_epi64/max_epi64/min_epu64/max_epu64 there is a slight complication that
> in AVX512F there is only _mm512_{min,max}_ep{i,u}64 but not the _mm256_ or
> _mm_ ones, so we need to perform 512-bit operations all the time rather than
> perform extractions, 256-bit operation, further extractions and then 128-bit
> operations.
>
> Seems we need to teach our permutation code further instructions, e.g.
> typedef long long V __attribute__((vector_size (64)));
> typedef int W __attribute__((vector_size (64)));
> W f0 (W x) {
>   return __builtin_shuffle (x, (W) { 8, 9, 10, 11, 12, 13, 14, 15, 0, 1, 2, 
> 3, 4, 5, 6, 7 });
> }
> V f1 (V x) {
>   return __builtin_shuffle (x, (V) { 4, 5, 6, 7, 0, 1, 2, 3 });
> }
> generate unnecessarily bad code (could use vpshufi64x2 instruction),
> guess that can be resolved for GCC8.
>
> Tested with
> make -j272 -k check-gcc RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} 
> i386.exp'
> on KNL, will bootstrap/regtest on my Haswell-E next, ok for trunk
> if that passes?
Patch is OK for trunk, thanks for implementing those intrinsics!

--
K

Reply via email to