Steve Sistare wrote:
> Nicolas Michael wrote:
> > trapstat -T is not working properly: It only reports some tsb misses, 
> > but the columns for tlb misses are all 0 (instruction and data). 
> 
> This is the expected result. The T2 processor handles TLB misses in
> hardware, by accessing the TSB directly.  If the mapping is not found
> in the TSB, then a software handler is invoked, and you see the TSB
> miss in trapstat.  If the mapping *is* found in the TSB, then software
> is not involved, hence trapstat cannot see the event.  This
> feature is called hardware tablewalk (HWTW).

Thanks! I have read this before but didn't realize that this also means that
trapstat cannot count TLB misses any more.

> The good news is that the cost of TLB misses is reduced.  The downside
> is that you lose visibility into which page sizes cause the misses.
> You can use "pmap -s" to see what page sizes are used in your processes,
> and if you see a large range of 8K pages, use large pages and see if
> the TLB miss rate that you measure with hardware counters goes down.

For lot's of segments (e.g. large SHM and heaps), we're already using large
pages (mostly 4M and some 256M). But text and some mmapped files are mostly 8k
and 64k.

For T1, there were tunables like use_text_pgsz64k, use_text_pgsz4m and
use_initdata_pgsz64k. Are there similar tunables for T2? (kdb doesn't know these
any more on my T2.) Or is this now only controlled through the
disable_*_large_pages variables?

> You can also use the hardware counters to estimate if the "high" TLB
> miss rate really matters.  TLB misses that hit in the TSB and
> hit in the L2$ are very cheap - approx 25 cycles.  The hardware
> counters tell you how many TLB misses miss in the L2 - see the
> countes ITLB_HWTW_miss_L2 and DTLB_HWTW_miss_L2.

That's good to know! Thanks for this information!

Until now, I just looked at the counters ITLB_miss and DTLB_miss, which --
according to a T2 performance document -- are supposed to have a miss cost of
100 cycles. This is what my evaluation was based on. If I understand you
correctly, ITLB_HWTW_miss_L2 and DTLB_HWTW_miss_L2 are a subset of these misses?
What is the miss cost for the latter two counters (is it the mentioned 100 
cycles)?

So, the miss cost should then be:
(ITLB_miss - ITLB_HWTW_miss_L2) * 25 + ITLB_HWTW_miss_L2 * 100 cycles
And the same for DTLB. Is this correct?

Thanks a lot,
Nick.

_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to