see questions below. On Tue, Dec 10, 2013 at 12:25 AM, Alex Bennée <alex.ben...@linaro.org> wrote: > > trent.t...@gmail.com writes: > >> Does anyone have profiles on how much time QEMU spends in translating >> instructions. QEMU does not have a baseline interpreter nor does it >> translate on trace-granularity. so i imagine QEMU must spend quite a bit >> of time translating instructions. > > Not as much as you'd think. The translation stage isn't very complex and > blocks only get translated once (modulo exceptions and self modifying > code). If you run perf on your task you should see most of the time is > spent in the generated code - if not please send the test case to the > list.
I took a profile running speccpu2006 403.gcc with test input on a intel xeon machine. we only spent 44.76% of the time in the code cache (i.e. 13M ticks in the code cache), while 40.97% of the time is spent in the qemu-system-x86_64. some of the hot functions in qemu-system-x86_64 are listed below. *you are right* we do not spend much time in translation routines. instead we spend significant amount of time in address translation code. CPU_CLK_UNHALTED % Symbol/Functions 1340512 100.00 anon (tgid:7106 range:0x7f97815ca000-0x7f979a692000) CPU_CLK_UNHALTED % Symbol/Functions 314655 25.64 address_space_translate_internal 308942 25.18 cpu_x86_exec 128922 10.51 ldq_phys 92345 7.53 cpu_x86_handle_mmu_fault 62456 5.09 tlb_set_page 49332 4.02 memory_region_is_ram 31055 2.53 helper_le_ldq_mmu 22048 1.80 memory_region_get_ram_addr 19223 1.57 memory_region_section_get_iotlb 15873 1.29 tcg_optimize 14526 1.18 get_page_addr_code 12601 1.03 memory_region_get_ram_ptr Xin > > I suspect the more useful statistic would be getting a break down of the > translation blocks and seeing which ones are the most heavily used and > examining if QEMU has done as good a job as it can of translating them. > >> Is it possible for QEMU to obviate some of the translations by attaching a >> signature (e.g. a hash) with every translated basic block and try to reuse >> translated basic block based on the signature as much as possible ? Reuses >> can be a result of rerunning programs or same libraries statically linked >> to programs. > > Your right a translation cache *could* save some translation time, > especially if you end up translating the same program over and over > again. Having said that you might find the cost of computing the > checksum obviates any speed-up from skipping the translation. After all > QEMU only needs to look at each subject instruction once normally. > > Using QEMU linux-user for cross building would be the obvious pain > point. However as the usual use case is building for embedded platforms > most users are just happy to fully utilise their 80-core build machines > in preference to having a farm of slow embedded processors. > >> This could end up saving some translation time. > > I think you would need to do some performance analysis and come up with > some numbers before you made that assumption. > > Cheers, > > -- > Alex Bennée > QEMU/KVM Hacker for Linaro >