trent.t...@gmail.com writes: > This patch adds a victim TLB to the QEMU system mode TLB. > > QEMU system mode page table walks are expensive. Taken by running QEMU > qemu-system-x86_64 system mode on Intel PIN , a TLB miss and walking a > 4-level page tables in guest Linux OS takes ~450 X86 instructions on > average. <snip> > > Attached are some performance results taken on SPECINT2006 train > dataset and a Intel(R) Xeon(R) CPU E5620 @ 2.40GHz Linux machine. In > summary, victim TLB improves the performance of qemu-system-x86_64 by > 11% on average on SPECINT2006 and with highest improvement of in 254% > in > 464.h264ref. And victim TLB does not result in any performance > degradation in any of the measured benchmarks. Furthermore, the > implemented victim TLB is architecture independent and is expected to > benefit other architectures in QEMU as well. > > Although there are measurement fluctuations, the performance > improvement are very significant and by no means in the range of > noises. <snip>
I'm curious as the implication seems to be that entries are evicted from initial TLB lookup before they are "done". What would the impact be of simply growing the size of the main TLB cache? What's the current state of instrumentation around the system TLB handling? Can we trace the hit rates of the various caches with perf/oprofile/whatever (Stefan?)? -- Alex Bennée