On Fri, May 22, 2020 at 11:52 AM Hongtao Liu <crazy...@gmail.com> wrote:
> > On a related note, it looks that pmov stores are modelled in a wrong
> > way. For example, this pattern;
> >
> > (define_insn "*avx512f_<code>v8div16qi2_store"
> >   [(set (match_operand:V16QI 0 "memory_operand" "=m")
> >     (vec_concat:V16QI
> >       (any_truncate:V8QI
> >         (match_operand:V8DI 1 "register_operand" "v"))
> >       (vec_select:V8QI
> >         (match_dup 0)
> >         (parallel [(const_int 8) (const_int 9)
> >                (const_int 10) (const_int 11)
> >                (const_int 12) (const_int 13)
> >                (const_int 14) (const_int 15)]))))]
> >
> > models the store in 128bit mode, but according to ISA, it stores in 16bit 
> > mode.
> >
> according to ISA, it stores in 64bit mode
> vpmovqb xmm1/m64 {k1}{z}, zmm2.
>
> memory_operand is 128bit but upper 64bit is not changed which means it
> store only lower 64bits, just same meaning to ISA.
Sorry, I somehow mixed insn patterns. This is the right example:

(define_insn "*avx512vl_<code>v2div2qi2_store"
  [(set (match_operand:V16QI 0 "memory_operand" "=m")
    (vec_concat:V16QI
      (any_truncate:V2QI
          (match_operand:V2DI 1 "register_operand" "v"))
      (vec_select:V14QI
        (match_dup 0)
        (parallel [(const_int 2) (const_int 3)
                   (const_int 4) (const_int 5)
                   (const_int 6) (const_int 7)
                   (const_int 8) (const_int 9)
                   (const_int 10) (const_int 11)
                   (const_int 12) (const_int 13)
                   (const_int 14) (const_int 15)]))))]
  "TARGET_AVX512VL"
  "vpmov<trunsuffix>qb\t{%1, %0|%w0, %1}"
  [(set_attr "type" "ssemov")
   (set_attr "memory" "store")
   (set_attr "prefix" "evex")
   (set_attr "mode" "TI")])

The isa says:

EVEX.128.F3.0F38.W0 32 /r VPMOVQB xmm1/m16 {k1}{z}, xmm2

However, the pattern says that V16QImode is stored to a memory. Due to
this, insn template needs %w modifier for intel dialect, which is the
sign that something is wrong with the pattern.

These conversions should be reimplemented as having
nonimmedate_operand output operand and memory operand should be split
to a separate insn using a pre-reload splitter. Please see how sse4_1
conversions handle their input operands.

Uros.

Reply via email to