Applications using multiple threads often call sched_yield(2) to indicate that one of the threads cannot make any progress because it is waiting for a resource held by another one.
One example of this scenario is the _spinlock() implementation of our librthread. But if you look on https://codesearch.debian.net you can find much more use cases, notably MySQL, PostgreSQL, JDK, libreoffice, etc. Now the problem with our current scheduler is that the priority of a thread decreases when it is the "curproc" of a CPU. So the threads that don't run and sched_yield(2) end up having a higher priority than the thread holding the resource. Which means that it's really hard for such multi-threaded applications to make progress, resulting in a lot of IPIs numbers. That'd also explain why if you have a more CPUs, let's say 4 instead of 2, your application will more likely make some progress and you'll see less sluttering/freezing. So what the diff below does is that it penalizes the threads from multi-threaded applications such that progress can be made. It is inspired from the recent scheduler work done by Michal Mazurek on tech@. I experimented with various values for "p_priority" and this one is the one that generates fewer # IPIs when watching a HD video on firefox. Because yes, with this diff, now I can. I'd like to know if dereferencing ``p_p'' is safe without holding the KERNEL_LOCK. I'm also interested in hearing from more people using multi-threaded applications. Index: kern/sched_bsd.c =================================================================== RCS file: /cvs/src/sys/kern/sched_bsd.c,v retrieving revision 1.43 diff -u -p -r1.43 sched_bsd.c --- kern/sched_bsd.c 9 Mar 2016 13:38:50 -0000 1.43 +++ kern/sched_bsd.c 19 Mar 2016 12:21:36 -0000 @@ -298,7 +298,16 @@ yield(void) int s; SCHED_LOCK(s); - p->p_priority = p->p_usrpri; + /* + * If one of the threads of a multi-threaded process called + * sched_yield(2), drop its priority to ensure its siblings + * can make some progress. + */ + if (TAILQ_FIRST(&p->p_p->ps_threads) == p && + TAILQ_NEXT(p, p_thr_link) == NULL) + p->p_priority = p->p_usrpri; + else + p->p_priority = min(MAXPRI, p->p_usrpri * 2); p->p_stat = SRUN; setrunqueue(p); p->p_ru.ru_nvcsw++;
