On Thu, Jun 15, 2023 at 1:23 PM Hongtao Liu <crazy...@gmail.com> wrote: > > On Wed, Jun 14, 2023 at 5:03 PM Jan Beulich <jbeul...@suse.com> wrote: > > > > On 14.06.2023 09:41, Hongtao Liu wrote: > > > On Wed, Jun 14, 2023 at 1:58 PM Jan Beulich via Gcc-patches > > > <gcc-patches@gcc.gnu.org> wrote: > > >> > > >> ... in vec_dupv4sf / *vec_dupv4si. The respective broadcast insns are > > >> never longer (yet sometimes shorter) than the corresponding VSHUFPS / > > >> VPSHUFD, due to the immediate operand of the shuffle insns balancing the > > >> need for VEX3 in the broadcast ones. When EVEX encoding is required the > > >> broadcast insns are always shorter. > > >> > > >> Add two new alternatives each, one covering the AVX2 case and one > > >> covering AVX512. > > > I think you can just change assemble output for this first alternative > > > when TARGET_AVX2, use vbroadcastss, else use vshufps since > > > vbroadcastss only accept register operand when TARGET_AVX2. And no > > > need to support 2 extra alternatives which doesn't make sense just > > > make RA more confused about the same meaning of different > > > alternatives. > > > > You mean by switching from "@ ..." to C code using "switch > > (which_alternative)"? I can do that, sure. Yet that'll make for a > > more complicated "length_immediate" attribute then. Would be nice > Yes, you can also do something like > (set (attr "length_immediate") > (cond [(eq_attr "alternative" "0") > (if_then_else (match_test "TARGET_AVX2) > (const_string "") > (const_string "1")) > ...] > > > if you could confirm that this is what you want, as I may well > > have misunderstood you. > > > > But that'll be for vec_dupv4sf only, as vec_dupv4si is subtly > > different. > Yes, but can we use vpbroadcastd for vec_dupv4si similarly? > > > > >> --- > > >> I'm working from the assumption that the isa attributes to the original > > >> 1st and 2nd alternatives don't need further restricting (to sse2_noavx2 > > >> or avx_noavx2 as applicable), as the new earlier alternatives cover all > > >> operand forms already when at least AVX2 is enabled. > > >> > > >> Isn't prefix_extra use bogus here? What extra prefix does vbroadcastss > > >> use? (Same further down in *vec_dupv4si and avx2_vbroadcasti128_<mode> > > >> and elsewhere.) > > > Not sure about this part. I grep prefix_extra, seems only used by > > > znver.md/znver4.md for schedule, and only for comi instructions(?the > > > reservation name seems so). > > > > define_attr "length_vex" and define_attr "length" use it, too. > > Otherwise I would have asked whether the attribute couldn't be > > purged from most insns. > > > > My present understanding is that the attribute is wrong on > > vec_dupv4sf (and hence wants dropping from there altogether), and it > > should be "prefix_data16" instead on *vec_dupv4si, evaluating to 1 > > only for the non-AVX pshufd case. I suspect at least the latter > > would be going to far for doing it "while here" right in this patch. > > Plus I think I have seen various other questionable uses of that > > attribute. > > > > >> Is use of Yv for the source operand really necessary in *vec_dupv4si? > > >> I.e. would scalar integer values be put in XMM{16...31} when AVX512VL > > > Yes, You can look at ix86_hard_regno_mode_ok, EXT_REX_SSE_REGNO is > > > allowed for scalar mode, but not for 128/256-bit vector modes. > > > > > > 20204 if (TARGET_AVX512F > > > 20205 && (VALID_AVX512F_REG_OR_XI_MODE (mode) > > > 20206 || VALID_AVX512F_SCALAR_MODE (mode))) > > > 20207 return true; > > > > Okay, so I need to switch input constraints for relevant new > > alternatives to Yv (I actually wonder why I did use v in > > vec_dupv4sf, as it was clear to me that SFmode can be in the high > > 16 xmm registers with just AVX512F). > > > > >> isn't enabled? If so (*movsi_internal / *movdi_internal suggest they > > >> might), wouldn't *vec_dupv2di need to use Yv as well in its 3rd > > >> alternative (or just m, as Yv is already covered by the 2nd one)? > > > I guess xm is more suitable since we still want to allocate > > > operands[1] to register when sse3_noavx. > > > It didn't hit any error since for avx and above, alternative 1(2rd > > > one) is always matched than alternative 2. > > > > I'm afraid I don't follow: With just -mavx512f the source operand > > can be in, say, %xmm16 (as per your clarification above). This > > would not match Yv, but it would match vm. And hence wrongly > > create an AVX512VL form of vmovddup. I didn't try it out earlier, > > because unlike for SFmode / DFmode I thought it's not really clear > > how to get the compiler to reliably put a DImode variable in an xmm > > reg, but it just occurred to me that this can be done the same way > > there. And voila, > > > > typedef long long __attribute__((vector_size(16))) v2di; > > > > v2di bcst(long long ll) { > > register long long x asm("xmm16") = ll; > > > > asm("nop %%esp" : "+v" (x)); > > return (v2di){x, x}; > > } > > > > compiled with just -mavx512f (and -O2) produces an AVX512VL insn. > Ah, I see, indeed it's a potential bug for -mavx512f -mavx512vl correct typo: -mno-avx512vl > I meant with -mavx512vl, > <mask_codefor><avx512>_vec_dup_gpr<mode><mask_name> will be matched > instead of vec_dupv2di since it's put before vec_dupv2di in .md file, > first will be matched first for exactly the same pattern available. So > we just need to handle non-avx512 cases. > Also even w/o <mask_codefor><avx512>_vec_dup_gpr<mode><mask_name>, in > the same pattern vec_dupv2di, with -mavx512vl, Yv(alternative 1) will > always be match instead of Yvm(alternative 2) for register operand, so > Yv in Yvm is never be matched that's why I said xm is more suitable > (or enough). > You can use -dp to check .s file which alternative will be matched. > > I'll make another patch, yet for that I'm then also not sure why > > you say xm would be more suitable. Yvm allows for registers (with > > or without AVX, merely SSE being required) just as much as vm > > does, doesn't it? And I don't think I've found any combination of > > destination being v and source being xm anywhere. Plus we want to > > allow for the higher registers when AVX512VL is enabled. > > > > Jan > > > > -- > BR, > Hongtao
-- BR, Hongtao