Hi all, This series is aimed at 2.10 or beyond. Its goal is to improve TCG performance by optimizing:
1- Cross-page direct jumps (softmmu only, obviously). Patches 1-4. 2- Indirect branches (softmmu and user-mode). Patches 5-9. 3- tb_jmp_cache hashing in user-mode. Patch 10. I decided to work on this after reading this paper [1] (code at [2]), which among other optimizations it proposes solutions for 1 and 2. I followed the same overall scheme they follow, that is to use helpers to check whether the target vaddr is valid, and if so, jump to its corresponding translated code (host address) without having to go back to the exec loop. My implementation differs from that in the paper in that it uses tb_jmp_cache instead of adding more caches, which is simpler and probably more resilient in environments where TLB invalidations are frequent (in the paper they acknowledge that they limited background processes to a minimum, which isn't realistic). These changes require modifications on the targets and, for optimization number 2, a new TCG opcode to jump to a host address contained in a register. For now I only implemented this for the i386 and arm targets, and the i386 TCG backend. Other targets/backends can easily opt-in. The 3rd optimization is implemented in the last patch: it improves tb_jmp_cache hashing for user-mode by removing the requirement of being able to clear parts of the cache given a page number, since this requirement only applies to softmmu. The series applies cleanly on top of 95b31d709ba34. The commit logs include many measurements, performed using SPECint06 and NBench from dbt-bench[3]. Feedback welcome! Thanks, Emilio [1] "Optimizing Control Transfer and Memory Virtualization in Full System Emulators", Ding-Yong Hong, Chun-Chen Hsu, Cheng-Yi Chou, Wei-Chung Hsu, Pangfeng Liu, Jan-Jan Wu. ACM TACO, Jan. 2016. http://www.iis.sinica.edu.tw/page/library/TechReport/tr2015/tr15002.pdf [2] https://github.com/tkhsu/quick-android-emulator/tree/quick-qemu [3] https://github.com/cota/dbt-bench