Hi Krishna, Krishna Yenduri wrote: > Hi All, > > I have a user level benchmark that does > for (i = 0; i < nthreads; i++) > (void) thr_create(NULL, 0, testaes, (void *)0, > THR_NEW_LWP, &tid); > > I found that running this benchmark with nthreads == ncpus > schedules each thread to a separate CPU. The system is a Niagara 2 > with 128 CPUs/strands. > > However, for a kernel module/benchmark that does > for (i = 0; i < nthreads; i++) > (void) thread_create(NULL, 0, &process_aes, (void *)i, 0, &p0, > TS_RUN, minclsyspri); > > the scheduling is very uneven and a whole set of CPUs from > 64-127 did not have any thread scheduled on them. The distribution > among 0-63 is also uneven. > > I assume the thread scheduling behavior is different for system threads > which do not have a LWP. But, is this not sub optimal? Is the > assumption that kernel subsystems that need to use a large number > of threads do their own CPU binding/scheduling to assure even distribution? > Not at all, in fact the dispatcher code is scheduling class agnostic for the most part, so modulo lgroup related differences (which I wouldn't expect to factor in here since kernel threads shouldn't have an affinity to either socket)...things should be well distributed for kernel threads as well....
-Eric _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org