On Mon, Jul 25, 2011 at 08:38:13PM +0200, Joerg Sonnenberger wrote: > On Mon, Jul 25, 2011 at 07:24:57PM +0100, David Laight wrote: > > On Mon, Jul 25, 2011 at 11:52:52AM +0200, Joerg Sonnenberger wrote: > > > Much better. One thing remains. It would be nice to replace > > > .byte 0xf3,0xc3 > > > with either a simple ret or a ret $0, depending on whether it has a > > > label on it or not. The reason for this mess seems to be a bug in > > > certain generation of AMD CPUs. So essentially, > > > > IIRC it is something to do with branch prediction? > > But my memory keeps thinking of a constraint about the number > > of branches/labels in a cache line - and I'm sure the non-use of > > 1 byte return instructions was all related. > > When I asked around, I get the following reference, which seems to > summarize the situation nicely: > > http://mikedimmick.blogspot.com/2008/03/what-heck-does-ret-mean.html
That is sort of consistent with what I remember from those guides. I wonder what the additional cost of 'rep ret' and 'ret $0' is on other cpus (apart from the obvious extra code byte). Looking at the code (now with fewer 'rep ret') I notice that a fair number of the jumps are unconditional - why have an unconditional jump to a return instruction! I also haven't checked what the critical paths are, and what the static predicton will do! I also don't know the cycle times of these special instructions to know how much it really matters! David -- David Laight: da...@l8s.co.uk