arch

David Laight Mon, 25 Jul 2011 15:06:44 -0700

On Mon, Jul 25, 2011 at 08:38:13PM +0200, Joerg Sonnenberger wrote:
> On Mon, Jul 25, 2011 at 07:24:57PM +0100, David Laight wrote:
> > On Mon, Jul 25, 2011 at 11:52:52AM +0200, Joerg Sonnenberger wrote:
> > > Much better. One thing remains. It would be nice to replace
> > >   .byte 0xf3,0xc3
> > > with either a simple ret or a ret $0, depending on whether it has a
> > > label on it or not. The reason for this mess seems to be a bug in
> > > certain generation of AMD CPUs. So essentially,
> > 
> > IIRC it is something to do with branch prediction?
> > But my memory keeps thinking of a constraint about the number
> > of branches/labels in a cache line - and I'm sure the non-use of
> > 1 byte return instructions was all related.
> 
> When I asked around, I get the following reference, which seems to
> summarize the situation nicely:
> 
> http://mikedimmick.blogspot.com/2008/03/what-heck-does-ret-mean.html


That is sort of consistent with what I remember from those guides.
I wonder what the additional cost of 'rep ret' and 'ret $0' is
on other cpus (apart from the obvious extra code byte).

Looking at the code (now with fewer 'rep ret') I notice that a fair
number of the jumps are unconditional - why have an unconditional jump
to a return instruction!
I also haven't checked what the critical paths are, and what the static
predicton will do! I also don't know the cycle times of these special
instructions to know how much it really matters!

        David

-- 
David Laight: da...@l8s.co.uk

Re: CVS commit: src/crypto/external/bsd/openssl/lib/libcrypto/arch

Reply via email to