Re: [PATCH] x86: make better use of VBROADCASTSS / VPBROADCASTD

Jan Beulich via Gcc-patches Wed, 14 Jun 2023 02:04:06 -0700

On 14.06.2023 09:41, Hongtao Liu wrote:
> On Wed, Jun 14, 2023 at 1:58 PM Jan Beulich via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> ... in vec_dupv4sf / *vec_dupv4si. The respective broadcast insns are
>> never longer (yet sometimes shorter) than the corresponding VSHUFPS /
>> VPSHUFD, due to the immediate operand of the shuffle insns balancing the
>> need for VEX3 in the broadcast ones. When EVEX encoding is required the
>> broadcast insns are always shorter.
>>
>> Add two new alternatives each, one covering the AVX2 case and one
>> covering AVX512.
> I think you can just change assemble output for this first alternative
> when TARGET_AVX2, use vbroadcastss, else use vshufps since
> vbroadcastss only accept register operand when TARGET_AVX2. And no
> need to support 2 extra alternatives which doesn't make sense just
> make RA more confused about the same meaning of different
> alternatives.


You mean by switching from "@ ..." to C code using "switch
(which_alternative)"? I can do that, sure. Yet that'll make for a
more complicated "length_immediate" attribute then. Would be nice
if you could confirm that this is what you want, as I may well
have misunderstood you.

But that'll be for vec_dupv4sf only, as vec_dupv4si is subtly
different.

>> ---
>> I'm working from the assumption that the isa attributes to the original
>> 1st and 2nd alternatives don't need further restricting (to sse2_noavx2
>> or avx_noavx2 as applicable), as the new earlier alternatives cover all
>> operand forms already when at least AVX2 is enabled.
>>
>> Isn't prefix_extra use bogus here? What extra prefix does vbroadcastss
>> use? (Same further down in *vec_dupv4si and avx2_vbroadcasti128_<mode>
>> and elsewhere.)
> Not sure about this part. I grep prefix_extra, seems only used by
> znver.md/znver4.md for schedule, and only for comi instructions(?the
> reservation name seems so).

define_attr "length_vex" and define_attr "length" use it, too.
Otherwise I would have asked whether the attribute couldn't be
purged from most insns.

My present understanding is that the attribute is wrong on
vec_dupv4sf (and hence wants dropping from there altogether), and it
should be "prefix_data16" instead on *vec_dupv4si, evaluating to 1
only for the non-AVX pshufd case. I suspect at least the latter
would be going to far for doing it "while here" right in this patch.
Plus I think I have seen various other questionable uses of that
attribute.

>> Is use of Yv for the source operand really necessary in *vec_dupv4si?
>> I.e. would scalar integer values be put in XMM{16...31} when AVX512VL
> Yes, You can look at ix86_hard_regno_mode_ok, EXT_REX_SSE_REGNO is
> allowed for scalar mode, but not for 128/256-bit vector modes.
> 
> 20204      if (TARGET_AVX512F
> 20205          && (VALID_AVX512F_REG_OR_XI_MODE (mode)
> 20206              || VALID_AVX512F_SCALAR_MODE (mode)))
> 20207        return true;

Okay, so I need to switch input constraints for relevant new
alternatives to Yv (I actually wonder why I did use v in
vec_dupv4sf, as it was clear to me that SFmode can be in the high
16 xmm registers with just AVX512F).

>> isn't enabled? If so (*movsi_internal / *movdi_internal suggest they
>> might), wouldn't *vec_dupv2di need to use Yv as well in its 3rd
>> alternative (or just m, as Yv is already covered by the 2nd one)?
> I guess xm is more suitable since we still want to allocate
> operands[1] to register when sse3_noavx.
> It didn't hit any error since for avx and above, alternative 1(2rd
> one) is always matched than alternative 2.

I'm afraid I don't follow: With just -mavx512f the source operand
can be in, say, %xmm16 (as per your clarification above). This
would not match Yv, but it would match vm. And hence wrongly
create an AVX512VL form of vmovddup. I didn't try it out earlier,
because unlike for SFmode / DFmode I thought it's not really clear
how to get the compiler to reliably put a DImode variable in an xmm
reg, but it just occurred to me that this can be done the same way
there. And voila,

typedef long long __attribute__((vector_size(16))) v2di;

v2di bcst(long long ll) {
        register long long x asm("xmm16") = ll;

        asm("nop %%esp" : "+v" (x));
        return (v2di){x, x};
}

compiled with just -mavx512f (and -O2) produces an AVX512VL insn.
I'll make another patch, yet for that I'm then also not sure why
you say xm would be more suitable. Yvm allows for registers (with
or without AVX, merely SSE being required) just as much as vm
does, doesn't it? And I don't think I've found any combination of
destination being v and source being xm anywhere. Plus we want to
allow for the higher registers when AVX512VL is enabled.

Jan

Re: [PATCH] x86: make better use of VBROADCASTSS / VPBROADCASTD

Reply via email to