On Wed, Jan 27, 2021 at 11:22:57AM +0100, Uros Bizjak wrote:
> > Bootstrapped/regtested on x86_64-linux and i686-linux.  Is this ok for trunk
> > (as exception), or for GCC 12?
> 
> If there is no urgent need, I'd rather see to obey stage-4 and wait
> for gcc-12. There is PR98375 meta bug to track gcc-12 pending patches.

Okay.

> > 2021-01-27  Jakub Jelinek  <ja...@redhat.com>
> >
> >         PR target/98737
> >         * config/i386/sync.md (neg; mov; lock xadd; add peephole2): New
> >         define_peephole2.
> >         (*atomic_fetch_sub_cmp<mode>): New define_insn.
> >
> >         * gcc.target/i386/pr98737.c: New test.
> 
> OK, although this peephole is quite complex and matched sequence is
> easily perturbed. Please note that reg-reg move is due to RA to
> satisfy register constraint; if the value is already in the right
> register, then the sequence won't match. Do we need additional pattern
> with reg-reg move omitted?

If there is no reg-reg move, then it is impossible to prove that it is a
negation.  The use of lock xadd forces addition instead of subtraction,
and additionally modifies its result, so for the comparison one needs
another register that holds the same value as the xadd initially.  And
we need to prove it is a negation.

> In the PR, Ulrich suggested to also handle other arith/logic
> operations, but matching these would be even harder, as they are
> emitted using cmpxchg loop. Maybe middle-end could emit a special
> version of the "boolean" atomic insn, if only flags are needed?

I guess we could add new optabs for the atomic builtins whose result
with the *_fetch operation rather than fetch_* is ==/!= compared against 0,
not sure if we could do anything else easily, because what exact kind of
comparison it is then is heavily machine dependent and the backend would
then need to emit everything including branches (like e.g. the addv<mode>4
etc. expanders).
Would equality comparison against 0 handle the most common cases.

The user can write it as
__atomic_sub_fetch (x, y, z) == 0
or
__atomic_fetch_sub (x, y, z) - y == 0
thouch, so the expansion code would need to be able to cope with both.
And the latter form is where all kinds of interfering optimizations pop up,
e.g. for the subtraction it will be actually optimized into
__atomic_fetch_sub (x, y, z) == y

        Jakub

Reply via email to