On Thu, 10 Jan 2019 at 19:33, Matwey V. Kornilov <matwey.korni...@gmail.com> wrote: > I am running the same application compiled for aarch64 and armv7l on > x86_64 platform using qemu-user-linux tools. > > I see dramatic performance difference (30 times) between emulated > architectures: aarch64 runs for ~4 minutes, armv7l runs for ~2 hours. > I do understand that CPU architecture emulation is inherently slow > thing, but my question is about the difference. > > How could I debug to understand what is the reason for such a big > difference? I've already tried to run stress-ng compiled for this two > architectures, but it leads to the same performance per second. > > I am running qemu 2.11, should I try other version?
Yes, do try 3.1 -- we have done some overall TCG performance improvements. For a big difference between target architectures like that, I would try starting by using some host performance tools on the two runs to see where all the time is being taken in the armv7l guest run -- is it all in translated guest code, or is there more time (proportionally) spent in particular parts of the QEMU C code? Does the armv7l version do many more or different syscalls (check with the QEMU -strace option) ? Also you should check performance on h/w 32 bit vs 64-bit Arm if you can, to confirm that it's not just that the guest application runs much slower there. (If you don't have the arm hardware you could at least check x86 32-bit vs 64-bit.) thanks -- PMM