On 10/28/2013 03:24 AM, Kirill Yukhin wrote:
> Hello Richard,
> On 22 Oct 08:16, Richard Henderson wrote:
>> On 10/22/2013 07:42 AM, Kirill Yukhin wrote:
>>> Hello Richard,
>>> Thanks for remarks, they all seems reasonable.
>>>
>>> One question
>>>
>>> On 21 Oct 16:01, Richard Henderson wrote:
>>>>> +(define_insn "avx512f_moves<mode>_mask"
>>>>> +  [(set (match_operand:VF_128 0 "register_operand" "=v")
>>>>> + (vec_merge:VF_128
>>>>> +   (vec_merge:VF_128
>>>>> +     (match_operand:VF_128 2 "register_operand" "v")
>>>>> +     (match_operand:VF_128 3 "vector_move_operand" "0C")
>>>>> +     (match_operand:<avx512fmaskmode> 4 "register_operand" "k"))
>>>>> +   (match_operand:VF_128 1 "register_operand" "v")
>>>>> +   (const_int 1)))]
>>>>> +  "TARGET_AVX512F"
>>>>> +  "vmov<ssescalarmodesuffix>\t{%2, %1, %0%{%4%}%N3|%0%{%4%}%N3, %1, %2}"
>>>>> +  [(set_attr "type" "ssemov")
>>>>> +   (set_attr "prefix" "evex")
>>>>> +   (set_attr "mode" "<sseinsnmode>")])
>>>>
>>>> Nested vec_merge?  That seems... odd to say the least.
>>>> How in the world does this get matched?
>>>
>>> This is generic approach for all scalar `masked' instructions.
>>>
>>> Reason is that we must save higher bits of vector (outer vec_merge)
>>> and apply single-bit mask (inner vec_merge).
>>>
>>>
>>> We may do it with unspecs though... But is it really better?
>>>
>>> What do you think?
>>
>> What I think is that while it's an instruction that exists in the ISA,
>> does that mean we must model it in the compiler?
>>
>> How would this pattern be used?
> 
> When we have all-1 mask then simplifier may reduce such pattern to simpler 
> form
> with single vec_merge.
> This will be impossible if we put unspec there.
> 
> So, for example for thise code:
>     __m128d
>     foo (__m128d x, __m128d y)
>     {
>       return _mm_maskz_add_sd (-1, x, y);
>     }
> 
> With unspec we will have:
> foo:
> .LFB2328:
>         movl    $-1, %eax       # 10    *movqi_internal/2       [length = 5]
>         kmovw   %eax, %k1       # 24    *movqi_internal/8       [length = 4]
>         vaddsd  %xmm1, %xmm0, %xmm0{%k1}{z}     # 11    
> sse2_vmaddv2df3_mask/2 [length = 6]
>         ret     # 27    simple_return_internal  [length = 1]
> 
> While for `semantic' version it will be simplified to:
> foo:
> .LFB2329:
>         vaddsd  %xmm1, %xmm0, %xmm0     # 11    sse2_vmaddv2df3/2       
> [length = 4]
>       ret     # 26    simple_return_internal  [length = 1]
> 
> So, we have short VEX insn vs. long EVEX one + mask creation insns.
> That is why we want to expose semantics of such operations.

This is not the question that I asked.

Why is a masked *scalar* operation useful?  The only way I can see
that one would get created is that builtin, which begs the question
of why the builtin exists.


r~

Reply via email to