On Mon, Jun 17, 2019 at 06:26:20PM +0100, Will Deacon wrote:
> On Mon, Jun 17, 2019 at 01:33:19PM +0200, Ard Biesheuvel wrote:
> > On my single core TX2, the comparative performance is as follows
> > 
> > Baseline: REFCOUNT_TIMING test using REFCOUNT_FULL (LSE cmpxchg)
> >       191057942484      cycles                    #    2.207 GHz
> >       148447589402      instructions              #    0.78  insn per
> > cycle
> > 
> >       86.568269904 seconds time elapsed
> > 
> > Upper bound: ATOMIC_TIMING
> >       116252672661      cycles                    #    2.207 GHz
> >        28089216452      instructions              #    0.24  insn per
> > cycle
> > 
> >       52.689793525 seconds time elapsed
> > 
> > REFCOUNT_TIMING test using LSE atomics
> >       127060259162      cycles                    #    2.207 GHz
> 
> Ok, so assuming JC's complaint is valid, then these numbers are compelling.
> In particular, my understanding of this thread is that your optimised
> implementation doesn't actually sacrifice any precision; it just changes
> the saturation behaviour in a way that has no material impact. Kees, is that
> right?

That is my understanding, yes. There is no loss to detection precision.
But for clarity, I should point out it has one behavioral change that is
the same change as on x86: the counter is now effectively a 31 bit counter
not a 32 bit counter, as the signed bit is being used for saturation.

> If so, I'm not against having this for arm64, with the premise that we can
> hide the REFCOUNT_FULL option entirely given that it would only serve to
> confuse if exposed.

If the LSE atomics version has overflow, dec-to-zero, and inc-from-zero
protections, then as far as I'm concerned, REFCOUNT_FULL doesn't need
to exist for arm64. On the Kconfig front, as long as there isn't a way
to revert refcount_t to atomic_t, I'm happy. :)

-- 
Kees Cook

Reply via email to