On Tue, 5 Oct 2021, H.J. Lu wrote:

> On Tue, Oct 5, 2021 at 3:07 AM Richard Biener <rguent...@suse.de> wrote:
> >
> > On Mon, 4 Oct 2021, H.J. Lu wrote:
> >
> > > commit adedd5c173388ae505470df152b9cb3947339566
> > > Author: Jakub Jelinek <ja...@redhat.com>
> > > Date:   Tue May 3 13:37:25 2016 +0200
> > >
> > >     re PR target/49244 (__sync or __atomic builtins will not emit 'lock 
> > > bts/btr/btc')
> > >
> > > optimized bit test on atomic builtin return with lock bts/btr/btc.  But
> > > it works only for unsigned integers since atomic builtins operate on the
> > > 'uintptr_t' type.  It fails on bool:
> > >
> > >   _1 = atomic builtin;
> > >   _4 = (_Bool) _1;
> > >
> > > and signed integers:
> > >
> > >   _1 = atomic builtin;
> > >   _2 = (int) _1;
> > >   _5 = _2 & (1 << N);
> > >
> > > Improve bit test on atomic builtin return by converting:
> > >
> > >   _1 = atomic builtin;
> > >   _4 = (_Bool) _1;
> > >
> > > to
> > >
> > >   _1 = atomic builtin;
> > >   _5 = _1 & (1 << 0);
> > >   _4 = (_Bool) _5;
> > >
> > > and converting:
> > >
> > >   _1 = atomic builtin;
> > >   _2 = (int) _1;
> > >   _5 = _2 & (1 << N);
> > >
> > > to
> > >   _1 = atomic builtin;
> > >   _6 = _1 & (1 << N);
> > >   _5 = (int) _6;
> >
> > Why not do this last bit with match.pd patterns (and independent on
> > whether _1 is defined by an atomic builtin)?  For the first suggested
> 
> The full picture is
> 
>  _1 = _atomic_fetch_or_* (ptr_6, mask, _3);
>   _2 = (int) _1;
>   _5 = _2 & mask;
> 
> to
> 
>   _1 = _atomic_fetch_or_* (ptr_6, mask, _3);
>   _6 = _1 & mask;
>   _5 = (int) _6;
> 
> It is useful only if 2 masks are the same.
> 
> > transform that's likely going to be undone by folding, no?
> >
> 
> The bool case is
> 
>   _1 = __atomic_fetch_or_* (ptr_6, 1, _3);
>   _4 = (_Bool) _1;
> 
> to
> 
>   _1 = __atomic_fetch_or_* (ptr_6, 1, _3);
>   _5 = _1 & 1;
>   _4 = (_Bool) _5;
> 
> Without __atomic_fetch_or_*, the conversion isn't needed.
> After the conversion, optimize_atomic_bit_test_and will
> immediately optimize the code sequence to
> 
>   _6 = .ATOMIC_BIT_TEST_AND_SET (&v, 0, 0, 0);
>   _4 = (_Bool) _6;
> 
> and there is nothing to fold after it.

Hmm, I see - so how about instead teaching the code that
produces the .ATOMIC_BIT_TEST_AND_SET the alternate forms instead
of doing the intermediate step separately?

Sorry for the delay btw, I've been busy all week ...

Thanks,
Richard.

Reply via email to