On Fri, May 22, 2020 at 11:52 AM Hongtao Liu <crazy...@gmail.com> wrote: > > On a related note, it looks that pmov stores are modelled in a wrong > > way. For example, this pattern; > > > > (define_insn "*avx512f_<code>v8div16qi2_store" > > [(set (match_operand:V16QI 0 "memory_operand" "=m") > > (vec_concat:V16QI > > (any_truncate:V8QI > > (match_operand:V8DI 1 "register_operand" "v")) > > (vec_select:V8QI > > (match_dup 0) > > (parallel [(const_int 8) (const_int 9) > > (const_int 10) (const_int 11) > > (const_int 12) (const_int 13) > > (const_int 14) (const_int 15)]))))] > > > > models the store in 128bit mode, but according to ISA, it stores in 16bit > > mode. > > > according to ISA, it stores in 64bit mode > vpmovqb xmm1/m64 {k1}{z}, zmm2. > > memory_operand is 128bit but upper 64bit is not changed which means it > store only lower 64bits, just same meaning to ISA.
Sorry, I somehow mixed insn patterns. This is the right example: (define_insn "*avx512vl_<code>v2div2qi2_store" [(set (match_operand:V16QI 0 "memory_operand" "=m") (vec_concat:V16QI (any_truncate:V2QI (match_operand:V2DI 1 "register_operand" "v")) (vec_select:V14QI (match_dup 0) (parallel [(const_int 2) (const_int 3) (const_int 4) (const_int 5) (const_int 6) (const_int 7) (const_int 8) (const_int 9) (const_int 10) (const_int 11) (const_int 12) (const_int 13) (const_int 14) (const_int 15)]))))] "TARGET_AVX512VL" "vpmov<trunsuffix>qb\t{%1, %0|%w0, %1}" [(set_attr "type" "ssemov") (set_attr "memory" "store") (set_attr "prefix" "evex") (set_attr "mode" "TI")]) The isa says: EVEX.128.F3.0F38.W0 32 /r VPMOVQB xmm1/m16 {k1}{z}, xmm2 However, the pattern says that V16QImode is stored to a memory. Due to this, insn template needs %w modifier for intel dialect, which is the sign that something is wrong with the pattern. These conversions should be reimplemented as having nonimmedate_operand output operand and memory operand should be split to a separate insn using a pre-reload splitter. Please see how sse4_1 conversions handle their input operands. Uros.