Hello Richard,
On 22 Oct 08:16, Richard Henderson wrote:
> On 10/22/2013 07:42 AM, Kirill Yukhin wrote:
> > Hello Richard,
> > Thanks for remarks, they all seems reasonable.
> > 
> > One question
> > 
> > On 21 Oct 16:01, Richard Henderson wrote:
> >>> +(define_insn "avx512f_moves<mode>_mask"
> >>> +  [(set (match_operand:VF_128 0 "register_operand" "=v")
> >>> + (vec_merge:VF_128
> >>> +   (vec_merge:VF_128
> >>> +     (match_operand:VF_128 2 "register_operand" "v")
> >>> +     (match_operand:VF_128 3 "vector_move_operand" "0C")
> >>> +     (match_operand:<avx512fmaskmode> 4 "register_operand" "k"))
> >>> +   (match_operand:VF_128 1 "register_operand" "v")
> >>> +   (const_int 1)))]
> >>> +  "TARGET_AVX512F"
> >>> +  "vmov<ssescalarmodesuffix>\t{%2, %1, %0%{%4%}%N3|%0%{%4%}%N3, %1, %2}"
> >>> +  [(set_attr "type" "ssemov")
> >>> +   (set_attr "prefix" "evex")
> >>> +   (set_attr "mode" "<sseinsnmode>")])
> >>
> >> Nested vec_merge?  That seems... odd to say the least.
> >> How in the world does this get matched?
> > 
> > This is generic approach for all scalar `masked' instructions.
> > 
> > Reason is that we must save higher bits of vector (outer vec_merge)
> > and apply single-bit mask (inner vec_merge).
> > 
> > 
> > We may do it with unspecs though... But is it really better?
> > 
> > What do you think?
> 
> What I think is that while it's an instruction that exists in the ISA,
> does that mean we must model it in the compiler?
> 
> How would this pattern be used?

When we have all-1 mask then simplifier may reduce such pattern to simpler form
with single vec_merge.
This will be impossible if we put unspec there.

So, for example for thise code:
    __m128d
    foo (__m128d x, __m128d y)
    {
      return _mm_maskz_add_sd (-1, x, y);
    }

With unspec we will have:
foo:
.LFB2328:
        movl    $-1, %eax       # 10    *movqi_internal/2       [length = 5]
        kmovw   %eax, %k1       # 24    *movqi_internal/8       [length = 4]
        vaddsd  %xmm1, %xmm0, %xmm0{%k1}{z}     # 11    sse2_vmaddv2df3_mask/2 
[length = 6]
        ret     # 27    simple_return_internal  [length = 1]

While for `semantic' version it will be simplified to:
foo:
.LFB2329:
        vaddsd  %xmm1, %xmm0, %xmm0     # 11    sse2_vmaddv2df3/2       [length 
= 4]
        ret     # 26    simple_return_internal  [length = 1]

So, we have short VEX insn vs. long EVEX one + mask creation insns.
That is why we want to expose semantics of such operations.

Thanks, K

Reply via email to