On 14.06.2023 09:41, Hongtao Liu wrote: > On Wed, Jun 14, 2023 at 1:58 PM Jan Beulich via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: >> >> ... in vec_dupv4sf / *vec_dupv4si. The respective broadcast insns are >> never longer (yet sometimes shorter) than the corresponding VSHUFPS / >> VPSHUFD, due to the immediate operand of the shuffle insns balancing the >> need for VEX3 in the broadcast ones. When EVEX encoding is required the >> broadcast insns are always shorter. >> >> Add two new alternatives each, one covering the AVX2 case and one >> covering AVX512. > I think you can just change assemble output for this first alternative > when TARGET_AVX2, use vbroadcastss, else use vshufps since > vbroadcastss only accept register operand when TARGET_AVX2. And no > need to support 2 extra alternatives which doesn't make sense just > make RA more confused about the same meaning of different > alternatives.
You mean by switching from "@ ..." to C code using "switch (which_alternative)"? I can do that, sure. Yet that'll make for a more complicated "length_immediate" attribute then. Would be nice if you could confirm that this is what you want, as I may well have misunderstood you. But that'll be for vec_dupv4sf only, as vec_dupv4si is subtly different. >> --- >> I'm working from the assumption that the isa attributes to the original >> 1st and 2nd alternatives don't need further restricting (to sse2_noavx2 >> or avx_noavx2 as applicable), as the new earlier alternatives cover all >> operand forms already when at least AVX2 is enabled. >> >> Isn't prefix_extra use bogus here? What extra prefix does vbroadcastss >> use? (Same further down in *vec_dupv4si and avx2_vbroadcasti128_<mode> >> and elsewhere.) > Not sure about this part. I grep prefix_extra, seems only used by > znver.md/znver4.md for schedule, and only for comi instructions(?the > reservation name seems so). define_attr "length_vex" and define_attr "length" use it, too. Otherwise I would have asked whether the attribute couldn't be purged from most insns. My present understanding is that the attribute is wrong on vec_dupv4sf (and hence wants dropping from there altogether), and it should be "prefix_data16" instead on *vec_dupv4si, evaluating to 1 only for the non-AVX pshufd case. I suspect at least the latter would be going to far for doing it "while here" right in this patch. Plus I think I have seen various other questionable uses of that attribute. >> Is use of Yv for the source operand really necessary in *vec_dupv4si? >> I.e. would scalar integer values be put in XMM{16...31} when AVX512VL > Yes, You can look at ix86_hard_regno_mode_ok, EXT_REX_SSE_REGNO is > allowed for scalar mode, but not for 128/256-bit vector modes. > > 20204 if (TARGET_AVX512F > 20205 && (VALID_AVX512F_REG_OR_XI_MODE (mode) > 20206 || VALID_AVX512F_SCALAR_MODE (mode))) > 20207 return true; Okay, so I need to switch input constraints for relevant new alternatives to Yv (I actually wonder why I did use v in vec_dupv4sf, as it was clear to me that SFmode can be in the high 16 xmm registers with just AVX512F). >> isn't enabled? If so (*movsi_internal / *movdi_internal suggest they >> might), wouldn't *vec_dupv2di need to use Yv as well in its 3rd >> alternative (or just m, as Yv is already covered by the 2nd one)? > I guess xm is more suitable since we still want to allocate > operands[1] to register when sse3_noavx. > It didn't hit any error since for avx and above, alternative 1(2rd > one) is always matched than alternative 2. I'm afraid I don't follow: With just -mavx512f the source operand can be in, say, %xmm16 (as per your clarification above). This would not match Yv, but it would match vm. And hence wrongly create an AVX512VL form of vmovddup. I didn't try it out earlier, because unlike for SFmode / DFmode I thought it's not really clear how to get the compiler to reliably put a DImode variable in an xmm reg, but it just occurred to me that this can be done the same way there. And voila, typedef long long __attribute__((vector_size(16))) v2di; v2di bcst(long long ll) { register long long x asm("xmm16") = ll; asm("nop %%esp" : "+v" (x)); return (v2di){x, x}; } compiled with just -mavx512f (and -O2) produces an AVX512VL insn. I'll make another patch, yet for that I'm then also not sure why you say xm would be more suitable. Yvm allows for registers (with or without AVX, merely SSE being required) just as much as vm does, doesn't it? And I don't think I've found any combination of destination being v and source being xm anywhere. Plus we want to allow for the higher registers when AVX512VL is enabled. Jan