Sorry, I was responding to Karl Magdsick's point about the cost of switch statements relating to Nathaniel G.H.'s point about the cost of translation/generation. FDO works in the case of interpreters and translators from my experience as code sequences are pretty predictable things (you just don't tend to choose instructions at random). Improving dynamically generated code is the best way to improve performance, as that's where you spend your time. However, the time spent compiling means you only want to optimise hot spots. Well thought out simple just-in-time compilation is often very hard to beat with an optimising compiler, for example, knowing your target architecture well means you can get a really good static register mapping to the host architecture. These lessons are well known in the literature. I'd still be interested to know if the translator's profiled performance improved using FDO. I have results from writing things in Java, where FDO is par for the course, but they hardly apply to QEMU/GCC.

Regards,

Ian Rogers
-- http://www.binarytranslator.org/

Daniel Egger wrote:

On 18.04.2005, at 11:51, Ian Rogers wrote:

I'm not sure if you can get GCC to generate code sequences like this, but you probably at least need to use the -fprofile-generate and -fprofile-use options
http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html


Feedback optimisation (FDO) will not work for two reasons:
a) qemu itself is something like a realtime compiler so FDO
   will only speed up the compiler but not the generated code
b) FDO will only provide speed boosts if the feedback phase
   has a chance to analyse a representative work pattern that
   is hopefully also repetitive

After all FDO is mostly about making a tradeoff size/speed
and rearranging code (mostly branches) to avoid branch
mispredictions of the CPU.

Servus,
      Daniel




_______________________________________________
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel

Reply via email to