Laurent Desnogues writes: > On Fri, Jun 15, 2012 at 12:30 AM, Lluís Vilanova <vilan...@ac.upc.edu> wrote: > [...] >> Now that I think of it, you will have problems generating code to surround >> each >> qemu_ld/st with a lightweight mechanism to get the time. In x86 it would be >> rdtsc, but you want to generate a host rdtsc instruction inside the code >> generated by QEMU's TCG, so you should also have to hack TCG (or the code >> generation pointers) to issue an rdtsc instruction.
> Even rdtsc would introduce enough noise that it wouldn't be reliable > for such a micro measurement: as far as I understand it, this instruction > can be reordered, so you need to flush the pipeline before issuing it. > Intel has a document about that: > download.intel.com/embedded/software/IA/324264.pdf > The overhead of their proposed method is so high that it's likely it > would take longer than the execution of the fast path itself. > IMHO a mix of YeongKyoon Lee way to count ld/st and comparison > between user mode and softmmu still seems to be the best approach > (well unless you have access to a cycle accurate simulator :-). Ah, true; I forgot about the architectural implications. Sometimes you just assume the nice in-order world :) Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth