From: 'Naveen N. Rao' > Sent: 23 January 2017 19:22 > On 2017/01/15 09:00AM, Benjamin Herrenschmidt wrote: > > On Fri, 2017-01-13 at 23:22 +0530, 'Naveen N. Rao' wrote: > > > > That rather depends on whether the processor has a store to load > > > > forwarder > > > > that will satisfy the read from the store buffer. > > > > I don't know about ppc, but at least some x86 will do that. > > > > > > Interesting - good to know that. > > > > > > However, I don't think powerpc does that and in-register swap is likely > > > faster regardless. Note also that gcc prefers this form at higher > > > optimization levels. > > > > Of course powerpc has a load-store forwarder these days, however, I > > wouldn't be surprised if the in-register form was still faster on some > > implementations, but this needs to be tested. > > Thanks for clarifying! To test this, I wrote a simple (perhaps naive) > test that just issues a whole lot of endian swaps and in _that_ test, it > does look like the load-store forwarder is doing pretty well. ... > This is all in a POWER8 vm. On POWER7, the in-register variant is around > 4 times faster than the ldbrx variant. ...
I wonder which is faster on the little 1GHz embedded ppc we use here. David