On 10/28/2013 03:24 AM, Kirill Yukhin wrote: > Hello Richard, > On 22 Oct 08:16, Richard Henderson wrote: >> On 10/22/2013 07:42 AM, Kirill Yukhin wrote: >>> Hello Richard, >>> Thanks for remarks, they all seems reasonable. >>> >>> One question >>> >>> On 21 Oct 16:01, Richard Henderson wrote: >>>>> +(define_insn "avx512f_moves<mode>_mask" >>>>> + [(set (match_operand:VF_128 0 "register_operand" "=v") >>>>> + (vec_merge:VF_128 >>>>> + (vec_merge:VF_128 >>>>> + (match_operand:VF_128 2 "register_operand" "v") >>>>> + (match_operand:VF_128 3 "vector_move_operand" "0C") >>>>> + (match_operand:<avx512fmaskmode> 4 "register_operand" "k")) >>>>> + (match_operand:VF_128 1 "register_operand" "v") >>>>> + (const_int 1)))] >>>>> + "TARGET_AVX512F" >>>>> + "vmov<ssescalarmodesuffix>\t{%2, %1, %0%{%4%}%N3|%0%{%4%}%N3, %1, %2}" >>>>> + [(set_attr "type" "ssemov") >>>>> + (set_attr "prefix" "evex") >>>>> + (set_attr "mode" "<sseinsnmode>")]) >>>> >>>> Nested vec_merge? That seems... odd to say the least. >>>> How in the world does this get matched? >>> >>> This is generic approach for all scalar `masked' instructions. >>> >>> Reason is that we must save higher bits of vector (outer vec_merge) >>> and apply single-bit mask (inner vec_merge). >>> >>> >>> We may do it with unspecs though... But is it really better? >>> >>> What do you think? >> >> What I think is that while it's an instruction that exists in the ISA, >> does that mean we must model it in the compiler? >> >> How would this pattern be used? > > When we have all-1 mask then simplifier may reduce such pattern to simpler > form > with single vec_merge. > This will be impossible if we put unspec there. > > So, for example for thise code: > __m128d > foo (__m128d x, __m128d y) > { > return _mm_maskz_add_sd (-1, x, y); > } > > With unspec we will have: > foo: > .LFB2328: > movl $-1, %eax # 10 *movqi_internal/2 [length = 5] > kmovw %eax, %k1 # 24 *movqi_internal/8 [length = 4] > vaddsd %xmm1, %xmm0, %xmm0{%k1}{z} # 11 > sse2_vmaddv2df3_mask/2 [length = 6] > ret # 27 simple_return_internal [length = 1] > > While for `semantic' version it will be simplified to: > foo: > .LFB2329: > vaddsd %xmm1, %xmm0, %xmm0 # 11 sse2_vmaddv2df3/2 > [length = 4] > ret # 26 simple_return_internal [length = 1] > > So, we have short VEX insn vs. long EVEX one + mask creation insns. > That is why we want to expose semantics of such operations.
This is not the question that I asked. Why is a masked *scalar* operation useful? The only way I can see that one would get created is that builtin, which begs the question of why the builtin exists. r~