Re: [Qemu-devel] [PATCH] cpu: implementing victim TLB for QEMU system emulated TLB

Xin Tong Thu, 23 Jan 2014 05:51:53 -0800

On Thu, Jan 23, 2014 at 5:23 AM, Alex Bennée <alex.ben...@linaro.org> wrote:
>
> trent.t...@gmail.com writes:
>
>> This patch adds a victim TLB to the QEMU system mode TLB.
>>
>> QEMU system mode page table walks are expensive. Taken by running QEMU
>> qemu-system-x86_64 system mode on Intel PIN , a TLB miss and walking a
>> 4-level page tables in guest Linux OS takes ~450 X86 instructions on
>> average.
> <snip>
>>
>> Attached are some performance results taken on SPECINT2006 train
>> dataset and a Intel(R) Xeon(R) CPU  E5620  @ 2.40GHz Linux machine. In
>> summary, victim TLB improves the performance of qemu-system-x86_64 by
>> 11% on average on SPECINT2006 and with highest improvement of in 254%
>> in
>> 464.h264ref. And victim TLB does not result in any performance
>> degradation in any of the measured benchmarks. Furthermore, the
>> implemented victim TLB is architecture independent and is expected to
>> benefit other architectures in QEMU as well.
>>
>> Although there are measurement fluctuations, the performance
>> improvement are very significant and by no means in the range of
>> noises.
> <snip>
>
> I'm curious as the implication seems to be that entries are evicted from
> initial TLB lookup before they are "done". What would the impact be of
> simply growing the size of the main TLB cache?


Growing the size of the TLB gives significant performance improvement
as well, i have an incomplete set of numbers. but with the numbers i
have, i see significant performance improvement. With this being said,
victim tlb is still a nice addition as no matter how big you make the
TLB, there will always be conflict misses due to the low associativity
of the directly mapped tlb table.

>
> What's the current state of instrumentation around the system TLB
> handling? Can we trace the hit rates of the various caches with
> perf/oprofile/whatever (Stefan?)?
>

we do not have any TLB hit/miss tracking in the QEMU mainline code
right now. I think perf/oprofile can tell us how much time we spend in
TLB lookup and TLB refill. we need TCG generated instrumentation to
get TLB hit/miss rate though.
> --
> Alex Bennée
>

Re: [Qemu-devel] [PATCH] cpu: implementing victim TLB for QEMU system emulated TLB

Reply via email to