Re: [PATCH] x86: make better use of VBROADCASTSS / VPBROADCASTD

Hongtao Liu via Gcc-patches Wed, 14 Jun 2023 22:23:27 -0700

On Wed, Jun 14, 2023 at 5:03 PM Jan Beulich <jbeul...@suse.com> wrote:
>
> On 14.06.2023 09:41, Hongtao Liu wrote:
> > On Wed, Jun 14, 2023 at 1:58 PM Jan Beulich via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> >>
> >> ... in vec_dupv4sf / *vec_dupv4si. The respective broadcast insns are
> >> never longer (yet sometimes shorter) than the corresponding VSHUFPS /
> >> VPSHUFD, due to the immediate operand of the shuffle insns balancing the
> >> need for VEX3 in the broadcast ones. When EVEX encoding is required the
> >> broadcast insns are always shorter.
> >>
> >> Add two new alternatives each, one covering the AVX2 case and one
> >> covering AVX512.
> > I think you can just change assemble output for this first alternative
> > when TARGET_AVX2, use vbroadcastss, else use vshufps since
> > vbroadcastss only accept register operand when TARGET_AVX2. And no
> > need to support 2 extra alternatives which doesn't make sense just
> > make RA more confused about the same meaning of different
> > alternatives.
>
> You mean by switching from "@ ..." to C code using "switch
> (which_alternative)"? I can do that, sure. Yet that'll make for a
> more complicated "length_immediate" attribute then. Would be nice
Yes, you can also do something like
   (set (attr "length_immediate")
     (cond [(eq_attr "alternative" "0")
                   (if_then_else (match_test "TARGET_AVX2)
                    (const_string "")
                   (const_string "1"))
                ...]


> if you could confirm that this is what you want, as I may well
> have misunderstood you.
>
> But that'll be for vec_dupv4sf only, as vec_dupv4si is subtly
> different.
Yes, but can we use vpbroadcastd for vec_dupv4si similarly?
>
> >> ---
> >> I'm working from the assumption that the isa attributes to the original
> >> 1st and 2nd alternatives don't need further restricting (to sse2_noavx2
> >> or avx_noavx2 as applicable), as the new earlier alternatives cover all
> >> operand forms already when at least AVX2 is enabled.
> >>
> >> Isn't prefix_extra use bogus here? What extra prefix does vbroadcastss
> >> use? (Same further down in *vec_dupv4si and avx2_vbroadcasti128_<mode>
> >> and elsewhere.)
> > Not sure about this part. I grep prefix_extra, seems only used by
> > znver.md/znver4.md for schedule, and only for comi instructions(?the
> > reservation name seems so).
>
> define_attr "length_vex" and define_attr "length" use it, too.
> Otherwise I would have asked whether the attribute couldn't be
> purged from most insns.
>
> My present understanding is that the attribute is wrong on
> vec_dupv4sf (and hence wants dropping from there altogether), and it
> should be "prefix_data16" instead on *vec_dupv4si, evaluating to 1
> only for the non-AVX pshufd case. I suspect at least the latter
> would be going to far for doing it "while here" right in this patch.
> Plus I think I have seen various other questionable uses of that
> attribute.
>
> >> Is use of Yv for the source operand really necessary in *vec_dupv4si?
> >> I.e. would scalar integer values be put in XMM{16...31} when AVX512VL
> > Yes, You can look at ix86_hard_regno_mode_ok, EXT_REX_SSE_REGNO is
> > allowed for scalar mode, but not for 128/256-bit vector modes.
> >
> > 20204      if (TARGET_AVX512F
> > 20205          && (VALID_AVX512F_REG_OR_XI_MODE (mode)
> > 20206              || VALID_AVX512F_SCALAR_MODE (mode)))
> > 20207        return true;
>
> Okay, so I need to switch input constraints for relevant new
> alternatives to Yv (I actually wonder why I did use v in
> vec_dupv4sf, as it was clear to me that SFmode can be in the high
> 16 xmm registers with just AVX512F).
>
> >> isn't enabled? If so (*movsi_internal / *movdi_internal suggest they
> >> might), wouldn't *vec_dupv2di need to use Yv as well in its 3rd
> >> alternative (or just m, as Yv is already covered by the 2nd one)?
> > I guess xm is more suitable since we still want to allocate
> > operands[1] to register when sse3_noavx.
> > It didn't hit any error since for avx and above, alternative 1(2rd
> > one) is always matched than alternative 2.
>
> I'm afraid I don't follow: With just -mavx512f the source operand
> can be in, say, %xmm16 (as per your clarification above). This
> would not match Yv, but it would match vm. And hence wrongly
> create an AVX512VL form of vmovddup. I didn't try it out earlier,
> because unlike for SFmode / DFmode I thought it's not really clear
> how to get the compiler to reliably put a DImode variable in an xmm
> reg, but it just occurred to me that this can be done the same way
> there. And voila,
>
> typedef long long __attribute__((vector_size(16))) v2di;
>
> v2di bcst(long long ll) {
>         register long long x asm("xmm16") = ll;
>
>         asm("nop %%esp" : "+v" (x));
>         return (v2di){x, x};
> }
>
> compiled with just -mavx512f (and -O2) produces an AVX512VL insn.
Ah, I see, indeed it's a potential bug for -mavx512f -mavx512vl
I meant with -mavx512vl,
<mask_codefor><avx512>_vec_dup_gpr<mode><mask_name> will be matched
instead of vec_dupv2di since it's put before vec_dupv2di in .md file,
first will be matched first for exactly the same pattern available. So
we just need to handle non-avx512 cases.
Also even w/o <mask_codefor><avx512>_vec_dup_gpr<mode><mask_name>, in
the same pattern vec_dupv2di, with -mavx512vl, Yv(alternative 1) will
always be match instead of Yvm(alternative 2) for register operand, so
Yv in Yvm is never be matched that's why I said xm is more suitable
(or enough).
You can use -dp to check .s file which alternative will be matched.
> I'll make another patch, yet for that I'm then also not sure why
> you say xm would be more suitable. Yvm allows for registers (with
> or without AVX, merely SSE being required) just as much as vm
> does, doesn't it? And I don't think I've found any combination of
> destination being v and source being xm anywhere. Plus we want to
> allow for the higher registers when AVX512VL is enabled.
>
> Jan



-- 
BR,
Hongtao

Re: [PATCH] x86: make better use of VBROADCASTSS / VPBROADCASTD

Reply via email to