Hello Richard, On 22 Oct 08:16, Richard Henderson wrote: > On 10/22/2013 07:42 AM, Kirill Yukhin wrote: > > Hello Richard, > > Thanks for remarks, they all seems reasonable. > > > > One question > > > > On 21 Oct 16:01, Richard Henderson wrote: > >>> +(define_insn "avx512f_moves<mode>_mask" > >>> + [(set (match_operand:VF_128 0 "register_operand" "=v") > >>> + (vec_merge:VF_128 > >>> + (vec_merge:VF_128 > >>> + (match_operand:VF_128 2 "register_operand" "v") > >>> + (match_operand:VF_128 3 "vector_move_operand" "0C") > >>> + (match_operand:<avx512fmaskmode> 4 "register_operand" "k")) > >>> + (match_operand:VF_128 1 "register_operand" "v") > >>> + (const_int 1)))] > >>> + "TARGET_AVX512F" > >>> + "vmov<ssescalarmodesuffix>\t{%2, %1, %0%{%4%}%N3|%0%{%4%}%N3, %1, %2}" > >>> + [(set_attr "type" "ssemov") > >>> + (set_attr "prefix" "evex") > >>> + (set_attr "mode" "<sseinsnmode>")]) > >> > >> Nested vec_merge? That seems... odd to say the least. > >> How in the world does this get matched? > > > > This is generic approach for all scalar `masked' instructions. > > > > Reason is that we must save higher bits of vector (outer vec_merge) > > and apply single-bit mask (inner vec_merge). > > > > > > We may do it with unspecs though... But is it really better? > > > > What do you think? > > What I think is that while it's an instruction that exists in the ISA, > does that mean we must model it in the compiler? > > How would this pattern be used?
When we have all-1 mask then simplifier may reduce such pattern to simpler form with single vec_merge. This will be impossible if we put unspec there. So, for example for thise code: __m128d foo (__m128d x, __m128d y) { return _mm_maskz_add_sd (-1, x, y); } With unspec we will have: foo: .LFB2328: movl $-1, %eax # 10 *movqi_internal/2 [length = 5] kmovw %eax, %k1 # 24 *movqi_internal/8 [length = 4] vaddsd %xmm1, %xmm0, %xmm0{%k1}{z} # 11 sse2_vmaddv2df3_mask/2 [length = 6] ret # 27 simple_return_internal [length = 1] While for `semantic' version it will be simplified to: foo: .LFB2329: vaddsd %xmm1, %xmm0, %xmm0 # 11 sse2_vmaddv2df3/2 [length = 4] ret # 26 simple_return_internal [length = 1] So, we have short VEX insn vs. long EVEX one + mask creation insns. That is why we want to expose semantics of such operations. Thanks, K