On Mon, 2019-12-30 at 00:46 +0100, Jakub Jelinek wrote:
> Hi!
> 
> The AVX512F documentation clearly states that in instructions where the
> destination is a memory only merging-masking is possible, not zero-masking,
> and the assembler enforces that.
> 
> The testcase in this patch fails to assemble because of
> Error: unsupported masking for `vextracti32x8'
> on
>         vextracti32x8   $0x0, %zmm1, -64(%rsp){%k1}{z}
> For the vector extraction patterns, we apparently have 7 *_maskm patterns
> that only accept memory destinations and rtx_equal_p merge-masking source
> for it, 7 *<mask_name> corresponding patterns that allow memory destination
> only for the non-masked cases (through <store_mask_constraint>), then 2
> *<mask_name> patterns (lo ssehalf V16FI and lo ssehalf VI8F_256 ones) which
> do allow memory destination even for masked cases and are the cause of the
> testsuite failure, because we must not allow C constraint if the destination
> is m, and finally one pair of patterns (separate * and *_mask, hi ssehalf
> VI4F_256), which has another issue (for which I don't have a testcase
> though), where if it would match zero-masking with register destination,
> it wouldn't emit the needed {z} into assembly.
> The attached patch fixes those 3 issues only, perhaps more suitable for
> backporting.
> But, even with that fixed, we are missing 3 further *_maskm patterns and
> more importantly, I find the split into 3 separate patterns after subst,
> *_maskm for masking with memory destination, *_mask for masking with
> register destination and * for non-masking unnecessarily complex and harder
> for reload, so the included patch below (non-attached) instead kills all
> *_maskm patterns and splits the *<mask_name> patterns into * and *_mask
> by hand instead of subst, where the *_mask ones make sure that with v
> destination they use 0C, while with m destination they use 0 and as
> condition enforce that either destination is not MEM, or rtx_equal_p between
> the destination and corresponding merging-masking operand source.
> If we had those 3 missing *_maskm patterns, this patch would actually result
> in both shorter sse.md and shorter machine description after subst (e.g.
> length of tmp-mddump.md), as we don't have them, the patch is actually 16
> lines longer sse.md, but still shorter tmp-mddump.md.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk (and is
> the shorter patch ok for backports)?
> 
> 2019-12-30  Jakub Jelinek  <ja...@redhat.com>
> 
>       PR target/93069
>       * config/i386/subst.md (store_mask_constraint, store_mask_predicate):
>       Remove.
>       (avx512dq_vextract<shuffletype>64x2_1_maskm,
>       avx512f_vextract<shuffletype>32x4_1_maskm,
>       vec_extract_lo_<mode>_maskm, vec_extract_hi_<mode>_maskm): Remove.
>       (<mask_codefor>avx512dq_vextract<shuffletype>64x2_1<mask_name>): Split
>       into ...
>       (*avx512dq_vextract<shuffletype>64x2_1,
>       avx512dq_vextract<shuffletype>64x2_1_mask): ... these new
>       define_insns.  Even in the masked variant allow memory output but in
>       that case use 0 rather than 0C constraint on the source of masked-out
>       elts.
>       (<mask_codefor>avx512f_vextract<shuffletype>32x4_1<mask_name>): Split
>       into ...
>       (*avx512f_vextract<shuffletype>32x4_1,
>       avx512f_vextract<shuffletype>32x4_1_mask): ... these new define_insns.
>       Even in the masked variant allow memory output but in that case use
>       0 rather than 0C constraint on the source of masked-out elts.
>       (vec_extract_lo_<mode><mask_name>): Split into ...
>       (vec_extract_lo_<mode>, vec_extract_lo_<mode>_mask): ... these new
>       define_insns.  Even in the masked variant allow memory output but in
>       that case use 0 rather than 0C constraint on the source of masked-out
>       elts.
>       (vec_extract_hi_<mode><mask_name>): Split into ...
>       (vec_extract_hi_<mode>, vec_extract_hi_<mode>_mask): ... these new
>       define_insns.  Even in the masked variant allow memory output but in
>       that case use 0 rather than 0C constraint on the source of masked-out
>       elts.
> 
>       * gcc.target/i386/avx512vl-pr93069.c: New test.
>       * gcc.dg/vect/pr93069.c: New test.
Sorry.  I know you asked me to look at this eons ago, but ever time I just get
lost.

I get the distinct impression that we could do something much simpler (the patch
you initially proposed for backporting to the release branches).  Perhaps we go
with that now and the full patch for gcc-11?


Jeff

Reply via email to