"Yangfei (Felix)" <felix.y...@huawei.com> writes: > Thanks for reviewing this. > Attached please find the v5 patch. > Note: we also need to modify local variable "mode" once we catch one case. I > see test failure without this change.
Looks good. Patch is OK assuming the x86 folks don't want to rewrite gcc.target/i386/pr67609.c to avoid the new optimisation. I'll hold off applying until the AVX512 thing is sorted. > Bootstrapped and tested on aarch64-linux-gnu. > Also bootstrapped on x86_64-linux-gnu. Regression test show two failed tests > on this platform: Thanks for the extra testing. > 1> FAIL: gcc.target/i386/avx512f-vcvtps2ph-2.c (test for excess errors) > 2> FAIL: gcc.target/i386/pr67609.c scan-assembler movdqa > > I have adjust the second one in the v4 patch. So this is: movdqa reg(%rip), %xmm1 movaps %xmm1, -24(%rsp) movsd %xmm0, -24(%rsp) movapd -24(%rsp), %xmm2 movaps %xmm2, reg(%rip) ret to: movq %xmm0, reg(%rip) ret Nice. I think it's safe to say that's an improvement :-) I don't know whether this means we're no longer testing what the test was intended to test. Maybe one of the x86 folks has an opinion about whether we should instead rewrite the test somehow. > But The first one looks strange to me. > I see gcc emits invalid x86 vcvtps2ph instrunctions which looks like: > > 125 vcvtps2ph $0, %zmm0, -112(%rbp){%k1} > 126 vcvtps2ph $0, %zmm0, -80(%rbp){%k1}{z} > > This happens in the combine phase, where an combined insn looks like: > > 1989 Trying 31 -> 33: > 1990 31: r106:V16HI=vec_merge(unspec[r103:V16SF,0] > 133,[frame:DI-0x60],r109:HI) > 1991 33: [frame:DI-0x60]=r106:V16HI > 1992 REG_DEAD r106:V16HI > 1993 Successfully matched this instruction: > 1994 (set (mem/j/c:V16HI (plus:DI (reg/f:DI 19 frame) > 1995 (const_int -96 [0xffffffffffffffa0])) [2 res2.x+0 S32 A256]) > 1996 (vec_merge:V16HI (unspec:V16HI [ > 1997 (reg:V16SF 103) > 1998 (const_int 0 [0]) > 1999 ] UNSPEC_VCVTPS2PH) > 2000 (mem/j/c:V16HI (plus:DI (reg/f:DI 19 frame) > 2001 (const_int -96 [0xffffffffffffffa0])) [2 res2.x+0 S32 > A256]) > 2002 (reg:HI 109))) > 2003 allowing combination of insns 31 and 33 > 2004 original costs 16 + 4 = 20 > 2005 replacement cost 16 > 2006 deferring deletion of insn with uid = 31. > 2007 modifying insn i3 33: [frame:DI-0x60]=vec_merge(unspec[r103:V16SF,0] > 133,[frame:DI-0x60],r109:HI) > 2008 deferring rescan insn with uid = 33. > > And this can be matched with pattern: avx512f_vcvtps2ph512_mask > 2282 (insn 33 31 37 4 (set (mem/j/c:V16HI (plus:DI (reg/f:DI 19 frame) > 2283 (const_int -96 [0xffffffffffffffa0])) [2 res2.x+0 S32 > A256]) > 2284 (vec_merge:V16HI (unspec:V16HI [ > 2285 (reg:V16SF 103) > 2286 (const_int 0 [0]) > 2287 ] UNSPEC_VCVTPS2PH) > 2288 (mem/j/c:V16HI (plus:DI (reg/f:DI 19 frame) > 2289 (const_int -96 [0xffffffffffffffa0])) [2 res2.x+0 > S32 A256]) > 2290 (reg:HI 109))) "avx512f-vcvtps2ph-2.c":80:10 5324 > {avx512f_vcvtps2ph512_mask} > 2291 (nil)) > > gcc/config/i386/sse.md: > 21663 (define_insn "<mask_codefor>avx512f_vcvtps2ph512<mask_name>" > 21664 [(set (match_operand:V16HI 0 "nonimmediate_operand" "=vm") > 21665 (unspec:V16HI > 21666 [(match_operand:V16SF 1 "register_operand" "v") > 21667 (match_operand:SI 2 "const_0_to_255_operand" "N")] > 21668 UNSPEC_VCVTPS2PH))] > 21669 "TARGET_AVX512F" > 21670 "vcvtps2ph\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}" > 21671 [(set_attr "type" "ssecvt") > 21672 (set_attr "prefix" "evex") > 21673 (set_attr "mode" "V16SF")]) > > How can that happen? This is due to define_subst magic. The generators automatically create a vec_merge form of the instruction based on the information in the <mode_*> attributes. AFAICT the rtl above is for the line-125 instruction, which looks ok. The problem is the line-126 instruction, since vcvtps2ph doesn't AIUI allow zero masking. The "mask" define_subst allows both zeroing and merging, so I guess this means that the pattern should either be using a different define_subst, or should be enforcing merging in some other way. Please could one of the x86 devs take a look? Thanks, Richard