Hi Krishna,

Krishna Yenduri wrote:
> Hi All,
>
>  I have a user level benchmark that does
>  for (i = 0; i < nthreads; i++)
>       (void) thr_create(NULL, 0, testaes, (void *)0,
>                             THR_NEW_LWP, &tid);
>
>  I found that running this benchmark with nthreads == ncpus
>  schedules each thread to a separate CPU. The system is a Niagara 2
>  with 128 CPUs/strands.
>
>  However, for a kernel module/benchmark that does
>  for (i = 0; i < nthreads; i++)
>     (void) thread_create(NULL, 0, &process_aes, (void *)i, 0, &p0,
>                                 TS_RUN, minclsyspri);
>
>  the scheduling is very uneven and a whole set of CPUs from
>  64-127 did not have any thread scheduled on them. The distribution
>  among 0-63 is also uneven.
>
>  I assume the thread scheduling behavior is different for system threads
>  which do not have a LWP. But, is this not sub optimal? Is the
>  assumption that kernel subsystems that need to use a large number
>  of threads do their own CPU binding/scheduling to assure even distribution?
>   
Not at all, in fact the dispatcher code is scheduling class agnostic for 
the most part, so modulo lgroup related differences (which I wouldn't 
expect to factor in here since kernel threads shouldn't have an affinity 
to either socket)...things should be well distributed for kernel threads 
as well....

-Eric
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to