Emilio G. Cota <c...@braap.org> writes: > Hi all, > > This series is aimed at 2.10 or beyond. Its goal is to improve > TCG performance by optimizing: > > 1- Cross-page direct jumps (softmmu only, obviously). Patches 1-4. > 2- Indirect branches (softmmu and user-mode). Patches 5-9. > 3- tb_jmp_cache hashing in user-mode. Patch 10. > > I decided to work on this after reading this paper [1] (code at [2]), > which among other optimizations it proposes solutions for 1 and 2. > I followed the same overall scheme they follow, that is to use helpers > to check whether the target vaddr is valid, and if so, jump to its > corresponding translated code (host address) without having to go back > to the exec loop. My implementation differs from that in the paper > in that it uses tb_jmp_cache instead of adding more caches, > which is simpler and probably more resilient in environments > where TLB invalidations are frequent (in the paper they acknowledge > that they limited background processes to a minimum, which isn't > realistic).
Hi Emilio, If you want to get some numbers on TLB invalidations please have a look at my WIP branch: https://github.com/stsquad/qemu/tree/misc/tlb-flush-stats It's mainly an experiment at how easy it is to extract number data using QEMU's trace subsystem (it turns out pretty easy). I had started looking at the execution trace but got a little bogged down with re-implementing hashes in python - it would be nice if we could just ctype dll load the C implementation (or maybe just save the computed hashes in another trace point rather than inferring via exec_tb). > > These changes require modifications on the targets and, for optimization > number 2, a new TCG opcode to jump to a host address contained in a register. > > For now I only implemented this for the i386 and arm targets, and > the i386 TCG backend. Other targets/backends can easily opt-in. > > The 3rd optimization is implemented in the last patch: it improves > tb_jmp_cache hashing for user-mode by removing the requirement of > being able to clear parts of the cache given a page number, since this > requirement only applies to softmmu. > > The series applies cleanly on top of 95b31d709ba34. > > The commit logs include many measurements, performed using SPECint06 and > NBench from dbt-bench[3]. > > Feedback welcome! Thanks, Given my notes above I think it would be worthwhile coming up with some trace-points in the helpers and hash lookups so we can analyse their behaviour as well as just looking at the performance improvement in benchmarks. > > Emilio > > [1] "Optimizing Control Transfer and Memory Virtualization > in Full System Emulators", Ding-Yong Hong, Chun-Chen Hsu, Cheng-Yi Chou, > Wei-Chung Hsu, Pangfeng Liu, Jan-Jan Wu. ACM TACO, Jan. 2016. > http://www.iis.sinica.edu.tw/page/library/TechReport/tr2015/tr15002.pdf > > [2] https://github.com/tkhsu/quick-android-emulator/tree/quick-qemu > > [3] https://github.com/cota/dbt-bench -- Alex Bennée