Re: [Qemu-devel] [PATCH v3 17/27] tcg-ppc64: Implement bswap64

Alexander Graf Tue, 02 Apr 2013 08:24:34 -0700

On 04/02/2013 05:12 PM, Richard Henderson wrote:

On 2013-04-02 07:41, Alexander Graf wrote:

On 2013-04-01 23:34, Alexander Graf wrote:

Is this faster than a load/store with std/ldbrx?


Hmm.  Almost certainly not.  And since we've got stack space
allocated for function calls, we've got scratch space to do it in.

Probably similar for bswap32 too, eh?

Depends - memory load/store doesn't come for free and bswap32 isquite short.


I'll do a tiny bit o benchmarking for power7.


Cool, thanks a bunch :)

Heh. "Almost certainly not" indeed. Unless I've made some sillymistake,

going through memory stalls badly.  No store buffer forwarding on power7?

With the following test case, time reports:

f1        2.967s
f2        8.930s
f3        7.071s
f4        7.166s

And note that f4 is a normal store/load pair, trying to determine whatthe

store buffer forwarding delay might be.

Yeah, doesn't look like it makes any sense at all to do a load/storecycle then. What a shame :).

Keep in mind that this tests icache hot cycles. However, you might getbad icache penalties due to the long bswap64 sequence. So all the memorylatency you see here might also affect the instruction stream when itgets executed. But then again we only care about performance of cachehot sequences in the first place....



Alex

Re: [Qemu-devel] [PATCH v3 17/27] tcg-ppc64: Implement bswap64

Reply via email to