On Thu, Jan 23, 2014 at 5:23 AM, Alex Bennée <alex.ben...@linaro.org> wrote: > > trent.t...@gmail.com writes: > >> This patch adds a victim TLB to the QEMU system mode TLB. >> >> QEMU system mode page table walks are expensive. Taken by running QEMU >> qemu-system-x86_64 system mode on Intel PIN , a TLB miss and walking a >> 4-level page tables in guest Linux OS takes ~450 X86 instructions on >> average. > <snip> >> >> Attached are some performance results taken on SPECINT2006 train >> dataset and a Intel(R) Xeon(R) CPU E5620 @ 2.40GHz Linux machine. In >> summary, victim TLB improves the performance of qemu-system-x86_64 by >> 11% on average on SPECINT2006 and with highest improvement of in 254% >> in >> 464.h264ref. And victim TLB does not result in any performance >> degradation in any of the measured benchmarks. Furthermore, the >> implemented victim TLB is architecture independent and is expected to >> benefit other architectures in QEMU as well. >> >> Although there are measurement fluctuations, the performance >> improvement are very significant and by no means in the range of >> noises. > <snip> > > I'm curious as the implication seems to be that entries are evicted from > initial TLB lookup before they are "done". What would the impact be of > simply growing the size of the main TLB cache?
Growing the size of the TLB gives significant performance improvement as well, i have an incomplete set of numbers. but with the numbers i have, i see significant performance improvement. With this being said, victim tlb is still a nice addition as no matter how big you make the TLB, there will always be conflict misses due to the low associativity of the directly mapped tlb table. > > What's the current state of instrumentation around the system TLB > handling? Can we trace the hit rates of the various caches with > perf/oprofile/whatever (Stefan?)? > we do not have any TLB hit/miss tracking in the QEMU mainline code right now. I think perf/oprofile can tell us how much time we spend in TLB lookup and TLB refill. we need TCG generated instrumentation to get TLB hit/miss rate though. > -- > Alex Bennée >