Re: [Qemu-devel] Potential to accelerate QEMU for specific architectures

Paolo Bonzini Mon, 27 May 2013 00:00:44 -0700

Il 26/05/2013 18:35, Lior Vernia ha scritto:
> What about no to the first bullet but yes to the second (just x86 on
> ARM)? Any room for significant improvement in that case, starting from
> the foundations of QEMU?


You could write a target-specific translator, yes.  But first of all I
would answer whether you're using 32- or 64-bit, and run some profiling
to see what is the hotspot in your case.

I know that in some scenarios helpers for SSE take a considerable amount
of time (5-10%).  You could look at adding SIMD data types to TCG, and
map them to Neon operations or even to fully-unrolled loops.

As other works, ahead-of-time translation can also do a lot more
optimizations, including very aggressive dead-code elimination.  For
example, again considering SSE, something like

     pcmpeqw  %xmm0, %xmm1
     pmovmskb %xmm1, %eax
     test     %eax, %eax
     jz       ...

will be translated to a slow sequence in QEMU due to the expensive
pmovmskb.  A custom code generator can observe that %eax is dead and use
a better translation of this idiom.

Also, floating-point emulation is always done in software in QEMU due to
different representations (and due to the 80-bit floating-point
registers mostly used by 32-bit x86).  This is going to be slow no
matter what.

Paolo

Re: [Qemu-devel] Potential to accelerate QEMU for specific architectures

Reply via email to