On Fri, Apr 11, 2008 at 11:23:10AM -0700, David Lutz wrote: > > Take a look at your cache miss rates as you cross the 2^11 boundary. > My guess is that you will see something start to go through the roof. > cputrack has too much overhead when having a bunch of LWPs. I did run cpustat though, in parallel with my experiment, with the following events on AMD64; the interval was 1 second:
pic0=DC_miss,pic1=DC_dtlb_L1_miss_L2_miss,pic2=IC_itlb_L1_miss_L2_miss The number of data cache misses _does_ increase too, but what's worse is DTLB and ITLB misses. Both roughly double with the number of threads, but the number of ITLB misses saturates at ~470k/s, and this saturation happens at the transition between 2048 and 4096 threads. All threads are executing the same code which is rather small -- so I see no reason for this linear increase in the # of ITLB misses with the number of threads. OK, more threads = more user<>kernel transitions. Does Solaris make use of the global bit in page directories/tables? _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org