On Wed, Sep 21, 2011 at 1:37 PM, Jakub Jelinek <[email protected]> wrote:
> For vcond{,u} etc. we currently generate vpandn+vpand+vpor
> sequence but SSE4.1+ has instructions for at least some modes
> to handle those 3 in one instruction (haven't benchmarked how much
> faster/slower it is though).
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, tested
> on SandyBridge too, AVX2 just eyeballed.
>
> 2011-09-21 Jakub Jelinek <[email protected]>
>
> * config/i386/i386.c (ix86_expand_sse_movcc): Use
> blendvps, blendvpd and pblendvb if possible.
>
> * gcc.dg/vect/vect-cond-7.c: New test.
> * gcc.target/i386/sse4_1-cond-1.c: New test.
> * gcc.target/i386/avx-cond-1.c: New test.
OK with a nit below:
> --- gcc/config/i386/i386.c.jj 2011-09-20 22:21:35.000000000 +0200
> +++ gcc/config/i386/i386.c 2011-09-21 10:09:09.000000000 +0200
> @@ -18905,24 +18905,42 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp
> }
> else
> {
> - op_true = force_reg (mode, op_true);
> + rtx (*gen) (rtx, rtx, rtx, rtx) = NULL;
> +
> op_false = force_reg (mode, op_false);
> + switch (mode)
> + {
> + case V4SFmode: if (TARGET_SSE4_1) gen = gen_sse4_1_blendvps; break;
> + case V2DFmode: if (TARGET_SSE4_1) gen = gen_sse4_1_blendvpd; break;
> + case V16QImode: if (TARGET_SSE4_1) gen = gen_sse4_1_pblendvb; break;
> + case V8SFmode: if (TARGET_AVX) gen = gen_avx_blendvps256; break;
> + case V4DFmode: if (TARGET_AVX) gen = gen_avx_blendvpd256; break;
> + case V32QImode: if (TARGET_AVX2) gen = gen_avx2_pblendvb; break;
> + default: break;
gen = NULL; here instead of break.
> + }
Please also add appropriate line breaks in the above code...
Thanks,
Uros.