On Mon, Aug 27, 2018 at 1:03 PM Alexander Monakov <amona...@ispras.ru> wrote: > > Hello, > > I'd like to apply this to resolve PR 85758. In bug discussion Marc noted that > what we are doing leads to code with higher instruction-level parallelism, > but it also has higher register pressure and longer code if the target does > not expose and-not patterns. > > Now that we have 2->2 combine, it is able to recover SSE and-not for > > typedef unsigned T __attribute__((vector_size(16))); > > void g(T, T); > > void f(T a, T b, T m, T s) > { > m &= s; > a += m; > m ^= s; > b += m; > g(a, b); > } > > but in scalar case costs say the original sequence is cheaper, even with > -mbmi. I think it's fine. > > OK for trunk? If a testcase is needed, please tell how to implement it.
OK. Richard. > Alexander > > * match.pd ((X & Y) ^ Y): Add :s qualifier to inner operand. > > diff --git a/gcc/match.pd b/gcc/match.pd > index cb3c93e3e16..d43e52d05cd 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -1027,7 +1027,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (for opo (bit_and bit_xor) > opi (bit_xor bit_and) > (simplify > - (opo:c (opi:c @0 @1) @1) > + (opo:c (opi:cs @0 @1) @1) > (bit_and (bit_not @0) @1))) > > /* Given a bit-wise operation CODE applied to ARG0 and ARG1, see if both >