https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81496

--- Comment #3 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Jakub Jelinek from comment #0)
> With -O2 -mavx{,2,512f}, we get on the following testcase:
> 
> typedef __int128 V __attribute__((vector_size (32)));
> typedef long long W __attribute__((vector_size (32)));
> typedef int X __attribute__((vector_size (16)));
> typedef __int128 Y __attribute__((vector_size (64)));
> typedef long long Z __attribute__((vector_size (64)));
> 
> W f1 (__int128 x, __int128 y) { return (W) ((V) { x, y }); }
> W f2 (__int128 x, __int128 y) { return (W) ((V) { y, x }); }
> 
>         movq    %rdi, -16(%rsp)
>         movq    %rsi, -8(%rsp)
>         movq    %rdx, -32(%rsp)
>         movq    %rcx, -24(%rsp)
>         vmovdqa -32(%rsp), %xmm0
>         vmovdqa -16(%rsp), %xmm1
>         vinserti128     $0x1, %xmm0, %ymm1, %ymm0
> for f1, which I'm afraid is hard to do anything about, because RA didn't see
> the usefulness to spill in different order, but for f2:
>         movq    %rdx, -32(%rsp)
>         movq    %rcx, -24(%rsp)
>         vmovdqa -32(%rsp), %xmm0
>         movq    %rdi, -16(%rsp)
>         movq    %rsi, -8(%rsp)
>         vinserti128     $0x1, -16(%rsp), %ymm0, %ymm0
> Before scheduling, the movdqa is next to vinserti128 from the adjacent mem;
> in that case it might be a win to use a vmovdqa -32(%rsp), %ymm0 instead.
> Though, the MEM has just A128 in the rtl dump, so maybe we need to use
> vmovdqu instead, unless we can prove it is 256-bit aligned (it is in this
> case, but not generally).

Maybe we can introduce a helper similar to movdi_to_sse on 32bit targets, but
to handle TImode on 64bit targets?

Reply via email to