Hello, On Sat, May 25, 2013 at 10:06 PM, Andreas Färber <afaer...@suse.de> wrote: > Hi, > > Am 24.05.2013 21:24, schrieb Lior Vernia: >> I am running x86 applications on an ARM device using QEMU, and found >> it too slow for my needs. > > Before we start going into technical details, what are you trying to > achieve on a high level and how did you try to do it? > > Are you using qemu-system-x86_64 or qemu-x86_64? The latest v1.5.0?
Sorry, right after I wrote the message it occured to me I should have mentioned that I was talking about qemu-system, either x86 or i386. At the moment I just ran the limbo app on a Galaxy SIII with various images, just to see the capabilities, and was disappointed. Limbo seems to run v1.1.0. If you suspect that it's the JNI wrapping that's causing a lot of the damage, then we can talk about compiling QEMU for ARM and running it natively, I just haven't been able to get that to work. >> This is to be expected, of course, this is >> not a complaint. > > Especially since most people still run on x86 ... > >> However, I was wondering whether this could be helped >> by "overriding" the generic binary translation mechanism and focusing >> on lower level binary translation just from x86 to ARM. >> >> It's clear to me that this isn't a small project, but it might be >> important enough for me to invest myself in. However, before I jump >> into it, I wanted to inquire whether this would be worthwhile at all. >> Does anyone have any estimate as to how big of a gain that could >> achieve? Or whether a more significant improvement could be achieved >> by further tweaking that didn't occur to me? I wanted to add that I've been reading about this Russian startup that's looking to emulate x86 on ARM at 40% of native speed using dynamic binary translation (as far as I gather): http://www.bit-tech.net/news/hardware/2012/10/04/x86-on-arm/1 So this should be possible. And it can't be very much unlike QEMU, can it? > > ... the tcg/arm/ code does not get a lot of love, so you might be able > to squeeze some more performance out of it by implementing optional TCG > ops or optimizing existing implementations. In theory most TCG ops > should correspond to a machine instruction (where available); there's a > TCG-level optimizer to create more efficient code, but it's a tradeoff > between time for code optimization and execution time. > > Needless to say that you should enable -O3 optimization (or something) > for the core C code and not to enable debug features in configure for > your performance measurements. :) > > Whatever implementation you experiment with, get familiar with our > Git-based workflow and try to stay close to qemu.git code or otherwise > you'll create a fork with little chance of getting integrated into the > code base - meaning both we don't get your speedups and you don't get > our latest features and bugfixes. One such example was the attempt to > use LLVM instead of TCG. Thanks, but we're getting slightly ahead of ourselves here :) I'd still want to make sure that QEMU is at fault for the performance, and if that's the case that there's potential for real improvement before I start getting my hands dirty .