Re: [PATCH] x86 V[24]TImode vec_{init,extract} (PR target/80846)

Uros Bizjak Thu, 20 Jul 2017 09:11:05 -0700

On Thu, Jul 20, 2017 at 9:47 AM, Jakub Jelinek <ja...@redhat.com> wrote:
> Hi!
>
> Richard has asked me recently to look at V[24]TI vector extraction
> and initialization, which he wants to use from the vectorizer.
>
> The following is an attempt to implement that.
>
> On the testcases included in the patch we get usually better or
> significantly better code generated, the exception is f1,
> where the change is:
> -       movq    %rdi, -32(%rsp)
> -       movq    %rsi, -24(%rsp)
> -       movq    %rdx, -16(%rsp)
> -       movq    %rcx, -8(%rsp)
> -       vmovdqa -32(%rsp), %ymm0
> +       movq    %rdi, -16(%rsp)
> +       movq    %rsi, -8(%rsp)
> +       movq    %rdx, -32(%rsp)
> +       movq    %rcx, -24(%rsp)
> +       vmovdqa -32(%rsp), %xmm0
> +       vmovdqa -16(%rsp), %xmm1
> +       vinserti128     $0x1, %xmm0, %ymm1, %ymm0
> which is something that is hard to handle before RA.  If the RA
> would spill it the other way around, perhaps it would be solveable by
> transforming
>         vmovdqa -32(%rsp), %xmm1
>         vmovdqa -16(%rsp), %xmm0
>         vinserti128     $0x01, %xmm0, %ymm1, %ymm0
> into
>         vmovdqa -32(%rsp), %ymm0
> using peephole2, but no idea how to force it that way.  And f11 also
> has similar problem, that time with 3 extra insns.  But if the TImode
> variable is allocated in a %?mm* register, we get better code even in those
> cases.


Please fill a PR about this issze. IIRC, I have seen this spill
problem some time ago.

> For V4TImode perhaps we could improve some special cases of vec_initv4ti,
> like broadcast or only one variable otherwise everything constant, but at
> least for the broadcast I'm not really sure what is the optimal sequence.
> vbroadcasti32x4 is only able to broadcast from memory, which is good if the
> TImode input lives in memory, but if it doesn't?  __builtin_shuffle right
> now generates vpermq with the indices loaded from memory, but that needs to
> wait for memory load...
>
> Another thing is that we actually don't permit a normal move instruction
> for V4TImode unless AVX512BW, so we used to generate terrible code (spill it
> into memory using GPRs and then load back).  Any reason for that?
> I've found:
> https://gcc.gnu.org/ml/gcc-patches/2014-08/msg01465.html
>> > > -   (V2TI "TARGET_AVX") V1TI
>> > > +   (V4TI "TARGET_AVX") (V2TI "TARGET_AVX") V1TI
>> >
>> > Are you sure TARGET_AVX is the correct condition for V4TI?
>> Right! This should be TARGET_AVX512BW (because corresponding shifts
>> belong to AVX-512BW).
> but it isn't at all clear what shifts this is talking about.  This is VMOVE,
> which is used just in mov<mode>, mov<mode>_internal and movmisalign<mode>
> patterns, I fail to see what kind of shifts would those produce.
> Those should only produce vmovdqa64, vmovdqu64, vpxord or vpternlogd insns
> with %zmm* operands, those are all AVX512F already.
>
> Anyway, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> Maybe it would be nice to also improve bitwise logical operations on
> V2TI/V4TImode - probably just expanders like {and,ior,xor}v[24]ti
> and maybe __builtin_shuffle.
>
> Richard also talked about V2OImode support, but I'm afraid that is going to
> be way too hard, we don't really have OImode support in most places.
>
> 2017-07-20  Jakub Jelinek  <ja...@redhat.com>
>
>         PR target/80846
>         * config/i386/i386.c (ix86_expand_vector_init_general): Handle
>         V2TImode and V4TImode.
>         (ix86_expand_vector_extract): Likewise.
>         * config/i386/sse.md (VMOVE): Enable V4TImode even for just
>         TARGET_AVX512F, instead of only for TARGET_AVX512BW.
>         (ssescalarmode): Handle V4TImode and V2TImode.
>         (VEC_EXTRACT_MODE): Add V4TImode and V2TImode.
>         (*vec_extractv2ti, *vec_extractv4ti): New insns.
>         (VEXTRACTI128_MODE): New mode iterator.
>         (splitter for *vec_extractv?ti first element): New.
>         (VEC_INIT_MODE): New mode iterator.
>         (vec_init<mode>): Consolidate 3 expanders into one using
>         VEC_INIT_MODE mode iterator.
>
>         * gcc.target/i386/avx-pr80846.c: New test.
>         * gcc.target/i386/avx2-pr80846.c: New test.
>         * gcc.target/i386/avx512f-pr80846.c: New test.

LGTM.

Thanks,
Uros.

Re: [PATCH] x86 V[24]TImode vec_{init,extract} (PR target/80846)

Reply via email to