On Fri, Apr 11, 2008 at 11:23:10AM -0700, David Lutz wrote:
> 
> Take a look at your cache miss rates as you cross the 2^11 boundary.
> My guess is that you will see something start to go through the roof.
> 
cputrack has too much overhead when having a bunch of LWPs.  I did run
cpustat though, in parallel with my experiment, with the following events
on AMD64; the interval was 1 second:

pic0=DC_miss,pic1=DC_dtlb_L1_miss_L2_miss,pic2=IC_itlb_L1_miss_L2_miss

The number of data cache misses _does_ increase too, but what's worse is
DTLB and ITLB misses.  Both roughly double with the number of threads, but
the number of ITLB misses saturates at ~470k/s, and this saturation happens
at the transition between 2048 and 4096 threads.

All threads are executing the same code which is rather small -- so I see
no reason for this linear increase in the # of ITLB misses with the number
of threads.  OK, more threads = more user<>kernel transitions.  Does Solaris
make use of the global bit in page directories/tables?

_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to