On Mon, Aug 27, 2018 at 1:03 PM Alexander Monakov <amona...@ispras.ru> wrote:
>
> Hello,
>
> I'd like to apply this to resolve PR 85758. In bug discussion Marc noted that
> what we are doing leads to code with higher instruction-level parallelism,
> but it also has higher register pressure and longer code if the target does
> not expose and-not patterns.
>
> Now that we have 2->2 combine, it is able to recover SSE and-not for
>
> typedef unsigned T __attribute__((vector_size(16)));
>
> void g(T, T);
>
> void f(T a, T b, T m, T s)
> {
>     m &= s;
>     a += m;
>     m ^= s;
>     b += m;
>     g(a, b);
> }
>
> but in scalar case costs say the original sequence is cheaper, even with
> -mbmi. I think it's fine.
>
> OK for trunk? If a testcase is needed, please tell how to implement it.

OK.

Richard.

> Alexander
>
>         * match.pd ((X & Y) ^ Y): Add :s qualifier to inner operand.
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index cb3c93e3e16..d43e52d05cd 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -1027,7 +1027,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (for opo (bit_and bit_xor)
>       opi (bit_xor bit_and)
>   (simplify
> -  (opo:c (opi:c @0 @1) @1)
> +  (opo:c (opi:cs @0 @1) @1)
>    (bit_and (bit_not @0) @1)))
>
>  /* Given a bit-wise operation CODE applied to ARG0 and ARG1, see if both
>

Reply via email to