Hi, I think you already said this, but are you using pthread_cond_signal, or pthread_cond_broadcast to wake up threads?
max Zeljko Vrba wrote: > On Thu, Apr 10, 2008 at 01:02:34PM -0700, Alexander Kolbasov wrote: > >> It may be useful to observe prstat -mL which will report micro-state >> accounting data for each thread. >> >> > prstat -mL takes a *lot* of time with 16k threads. Nevertheless, I have some > further data: reruning the dtrace gave me the output of top 10 consumers of > CPU time for both CPUs: the winner is "idle" on CPU0 (671 samples from 2029 > samples; the next highest has 114 samples), and again "idle" on CPU1 (776/3891 > samples, the next highest has 106 samples). Ordinary prstat shows that my > process is often in sleep state. > > Furthermore, I do not think that the problem lies in TLB trashing. Here > are three different runs: > > 2^28 B total block size (256MB), 2^14 B chunk size (= also 2^{28-14} threads), > 2^7 repetitions (= 2^35 B (32 GB) encrypted in total): 33.6 seconds > > 2^24 B total block size (16MB), 2^10 B chunk size (= again 2^14 threads), > 2^7 repetitions (= 2^31 B (2 GB) encrypted in total): 25.4 seconds > > 16MB block size is only twice the TLB capacity (2048 entries x 4kB = 8MB). > Lowering the block size to 4MB (half the TLB capacity) gives the following: > > 2^22 B total block size (4 MB), 2^8 B chunk size (= 2^14 threads), > 2^7 repetitions ( = 2^29 B (512 MB) encrypted in total): 24.5 seconds > > == > > Is there some backoff heuristics in the mutex/CV/whatever code that puts the > thread to sleep under high contention? Adaptive mutexes? I'm off to browse > the > opensolaris code on the net. > > == > > _______________________________________________ > perf-discuss mailing list > perf-discuss@opensolaris.org > > _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org