On Thu, Apr 10, 2008 at 01:02:34PM -0700, Alexander Kolbasov wrote:
> 
> It may be useful to observe prstat -mL which will report micro-state 
> accounting data for each thread. 
> 
prstat -mL takes a *lot* of time with 16k threads.  Nevertheless, I have some
further data: reruning the dtrace gave me the output of top 10 consumers of
CPU time for both CPUs: the winner is "idle" on CPU0 (671 samples from 2029
samples; the next highest has 114 samples), and again "idle" on CPU1 (776/3891
samples, the next highest has 106 samples).  Ordinary prstat shows that my
process is often in sleep state.

Furthermore, I do not think that the problem lies in TLB trashing.  Here
are three different runs:

2^28 B total block size (256MB), 2^14 B chunk size (= also 2^{28-14} threads),
2^7 repetitions (= 2^35 B (32 GB) encrypted in total): 33.6 seconds

2^24 B total block size (16MB), 2^10 B chunk size (= again 2^14 threads),
2^7 repetitions (= 2^31 B (2 GB) encrypted in total): 25.4 seconds 

16MB block size is only twice the TLB capacity (2048 entries x 4kB = 8MB). 
Lowering the block size to 4MB (half the TLB capacity) gives the following:

2^22 B total block size (4 MB), 2^8 B chunk size (= 2^14 threads),
2^7 repetitions ( = 2^29 B (512 MB) encrypted in total): 24.5 seconds

==

Is there some backoff heuristics in the mutex/CV/whatever code that puts the
thread to sleep under high contention?  Adaptive mutexes?  I'm off to browse the
opensolaris code on the net. 

==

_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to