On Wed, Jun 14, 2023 at 5:03 PM Jan Beulich <jbeul...@suse.com> wrote: > > On 14.06.2023 09:41, Hongtao Liu wrote: > > On Wed, Jun 14, 2023 at 1:58 PM Jan Beulich via Gcc-patches > > <gcc-patches@gcc.gnu.org> wrote: > >> > >> ... in vec_dupv4sf / *vec_dupv4si. The respective broadcast insns are > >> never longer (yet sometimes shorter) than the corresponding VSHUFPS / > >> VPSHUFD, due to the immediate operand of the shuffle insns balancing the > >> need for VEX3 in the broadcast ones. When EVEX encoding is required the > >> broadcast insns are always shorter. > >> > >> Add two new alternatives each, one covering the AVX2 case and one > >> covering AVX512. > > I think you can just change assemble output for this first alternative > > when TARGET_AVX2, use vbroadcastss, else use vshufps since > > vbroadcastss only accept register operand when TARGET_AVX2. And no > > need to support 2 extra alternatives which doesn't make sense just > > make RA more confused about the same meaning of different > > alternatives. > > You mean by switching from "@ ..." to C code using "switch > (which_alternative)"? I can do that, sure. Yet that'll make for a > more complicated "length_immediate" attribute then. Would be nice Yes, you can also do something like (set (attr "length_immediate") (cond [(eq_attr "alternative" "0") (if_then_else (match_test "TARGET_AVX2) (const_string "") (const_string "1")) ...]
> if you could confirm that this is what you want, as I may well > have misunderstood you. > > But that'll be for vec_dupv4sf only, as vec_dupv4si is subtly > different. Yes, but can we use vpbroadcastd for vec_dupv4si similarly? > > >> --- > >> I'm working from the assumption that the isa attributes to the original > >> 1st and 2nd alternatives don't need further restricting (to sse2_noavx2 > >> or avx_noavx2 as applicable), as the new earlier alternatives cover all > >> operand forms already when at least AVX2 is enabled. > >> > >> Isn't prefix_extra use bogus here? What extra prefix does vbroadcastss > >> use? (Same further down in *vec_dupv4si and avx2_vbroadcasti128_<mode> > >> and elsewhere.) > > Not sure about this part. I grep prefix_extra, seems only used by > > znver.md/znver4.md for schedule, and only for comi instructions(?the > > reservation name seems so). > > define_attr "length_vex" and define_attr "length" use it, too. > Otherwise I would have asked whether the attribute couldn't be > purged from most insns. > > My present understanding is that the attribute is wrong on > vec_dupv4sf (and hence wants dropping from there altogether), and it > should be "prefix_data16" instead on *vec_dupv4si, evaluating to 1 > only for the non-AVX pshufd case. I suspect at least the latter > would be going to far for doing it "while here" right in this patch. > Plus I think I have seen various other questionable uses of that > attribute. > > >> Is use of Yv for the source operand really necessary in *vec_dupv4si? > >> I.e. would scalar integer values be put in XMM{16...31} when AVX512VL > > Yes, You can look at ix86_hard_regno_mode_ok, EXT_REX_SSE_REGNO is > > allowed for scalar mode, but not for 128/256-bit vector modes. > > > > 20204 if (TARGET_AVX512F > > 20205 && (VALID_AVX512F_REG_OR_XI_MODE (mode) > > 20206 || VALID_AVX512F_SCALAR_MODE (mode))) > > 20207 return true; > > Okay, so I need to switch input constraints for relevant new > alternatives to Yv (I actually wonder why I did use v in > vec_dupv4sf, as it was clear to me that SFmode can be in the high > 16 xmm registers with just AVX512F). > > >> isn't enabled? If so (*movsi_internal / *movdi_internal suggest they > >> might), wouldn't *vec_dupv2di need to use Yv as well in its 3rd > >> alternative (or just m, as Yv is already covered by the 2nd one)? > > I guess xm is more suitable since we still want to allocate > > operands[1] to register when sse3_noavx. > > It didn't hit any error since for avx and above, alternative 1(2rd > > one) is always matched than alternative 2. > > I'm afraid I don't follow: With just -mavx512f the source operand > can be in, say, %xmm16 (as per your clarification above). This > would not match Yv, but it would match vm. And hence wrongly > create an AVX512VL form of vmovddup. I didn't try it out earlier, > because unlike for SFmode / DFmode I thought it's not really clear > how to get the compiler to reliably put a DImode variable in an xmm > reg, but it just occurred to me that this can be done the same way > there. And voila, > > typedef long long __attribute__((vector_size(16))) v2di; > > v2di bcst(long long ll) { > register long long x asm("xmm16") = ll; > > asm("nop %%esp" : "+v" (x)); > return (v2di){x, x}; > } > > compiled with just -mavx512f (and -O2) produces an AVX512VL insn. Ah, I see, indeed it's a potential bug for -mavx512f -mavx512vl I meant with -mavx512vl, <mask_codefor><avx512>_vec_dup_gpr<mode><mask_name> will be matched instead of vec_dupv2di since it's put before vec_dupv2di in .md file, first will be matched first for exactly the same pattern available. So we just need to handle non-avx512 cases. Also even w/o <mask_codefor><avx512>_vec_dup_gpr<mode><mask_name>, in the same pattern vec_dupv2di, with -mavx512vl, Yv(alternative 1) will always be match instead of Yvm(alternative 2) for register operand, so Yv in Yvm is never be matched that's why I said xm is more suitable (or enough). You can use -dp to check .s file which alternative will be matched. > I'll make another patch, yet for that I'm then also not sure why > you say xm would be more suitable. Yvm allows for registers (with > or without AVX, merely SSE being required) just as much as vm > does, doesn't it? And I don't think I've found any combination of > destination being v and source being xm anywhere. Plus we want to > allow for the higher registers when AVX512VL is enabled. > > Jan -- BR, Hongtao