Il 26/05/2013 18:35, Lior Vernia ha scritto: > What about no to the first bullet but yes to the second (just x86 on > ARM)? Any room for significant improvement in that case, starting from > the foundations of QEMU?
You could write a target-specific translator, yes. But first of all I would answer whether you're using 32- or 64-bit, and run some profiling to see what is the hotspot in your case. I know that in some scenarios helpers for SSE take a considerable amount of time (5-10%). You could look at adding SIMD data types to TCG, and map them to Neon operations or even to fully-unrolled loops. As other works, ahead-of-time translation can also do a lot more optimizations, including very aggressive dead-code elimination. For example, again considering SSE, something like pcmpeqw %xmm0, %xmm1 pmovmskb %xmm1, %eax test %eax, %eax jz ... will be translated to a slow sequence in QEMU due to the expensive pmovmskb. A custom code generator can observe that %eax is dead and use a better translation of this idiom. Also, floating-point emulation is always done in software in QEMU due to different representations (and due to the 80-bit floating-point registers mostly used by 32-bit x86). This is going to be slow no matter what. Paolo