Paolo Bonzini <pbonz...@redhat.com> writes: > On 24/07/2017 23:03, Pranith Kumar wrote: >> This patch increases the number of entries we allow in the TLB. I went >> over a few architectures to see if increasing it is problematic. Only >> armv6 seems to have a limitation that only 8 bits can be used for >> indexing these entries. For other architectures, I increased the >> number of TLB entries to a 4K-sized cache. >> >> Signed-off-by: Pranith Kumar <bobby.pr...@gmail.com> > > How did you benchmark this, and can you plot (at least for x86 hosts) > the results as CPU_TLB_BITS_MAX grows from 8 to 12?
Pranith has some numbers but what we were seeing is the re-fill path creeping up the perf profiles. Because it is so expensive to re-compute the entries pushing up the TLB size does ameliorate the problem. That said I don't think increasing the TLB size is our only solution. What I've asked for is some sort of idea of the pattern for the eviction of entries from the TLB and the performance of the victim cache. It may be tweaking the locality of that cache would be enough. One idea I had was With an 8 bit TLB you could afford to have 256 dynamically grown arrays in the victim path - one per entry. Then at flush time you could simply count up the number of victims in the array for that slot. That would give you a good idea if some regions are hotter than others. -- Alex Bennée