Re: Scheduler hack for multi-threaded processes

Martin Pieuchot Wed, 23 Mar 2016 14:18:06 -0700

On 23/03/16(Wed) 21:35, Mark Kettenis wrote:
> > Date: Mon, 21 Mar 2016 16:51:16 +0100 (CET)
> > From: Mark Kettenis <[email protected]>
> > 
> > > Date: Sat, 19 Mar 2016 13:53:07 +0100
> > > From: Martin Pieuchot <[email protected]>
> > > 
> > > Applications using multiple threads often call sched_yield(2) to
> > > indicate that one of the threads cannot make any progress because
> > > it is waiting for a resource held by another one.
> > > 
> > > One example of this scenario is the _spinlock() implementation of
> > > our librthread.  But if you look on https://codesearch.debian.net
> > > you can find much more use cases, notably MySQL, PostgreSQL, JDK,
> > > libreoffice, etc.
> > > 
> > > Now the problem with our current scheduler is that the priority of
> > > a thread decreases when it is the "curproc" of a CPU.  So the threads
> > > that don't run and sched_yield(2) end up having a higher priority than
> > > the thread holding the resource.  Which means that it's really hard for
> > > such multi-threaded applications to make progress, resulting in a lot of
> > > IPIs numbers.
> > > That'd also explain why if you have a more CPUs, let's say 4 instead
> > > of 2, your application will more likely make some progress and you'll
> > > see less sluttering/freezing.
> > > 
> > > So what the diff below does is that it penalizes the threads from
> > > multi-threaded applications such that progress can be made.  It is
> > > inspired from the recent scheduler work done by Michal Mazurek on
> > > tech@.
> > > 
> > > I experimented with various values for "p_priority" and this one is
> > > the one that generates fewer # IPIs when watching a HD video on firefox. 
> > > Because yes, with this diff, now I can.
> > > 
> > > I'd like to know if dereferencing ``p_p'' is safe without holding the
> > > KERNEL_LOCK.
> > > 
> > > I'm also interested in hearing from more people using multi-threaded
> > > applications.
> > 
> > This doesn't only change the sched_yield() behaviour, but also
> > modifies how in-kernel yield() calls behave for threaded processes.
> > That is probably not right.
> 
> So here is a diff that keeps yield() the same and adds the code in the
> sched_yield(2) implementation instead.


I'm not a big fan of code duplication, what about checking for
p_usrpri >= PUSER instead?

>                                         It also changes the logic that
> picks the priority to walk the complete threads listand pick the
> highest priotity of all the threads.  That should be enough to make
> sure the calling thread is scheduled behind all other threads from the
> same process.  That does require us to grab the kernel lock though.
> So this removes NOLOCK from sched_yield(2).  I don't think that is a
> big issue.
> 
> Still improves video playback in firefox on my x1.

This is another kind of hack :)  It's certainly less intrusive but
I'm still not sure if we want that.  Reducing the priority of a
thread to the worst priority of its siblings might still not be
enough.

Now I'd like to know how many times sched_yield() really triggers a
context switch for threaded programs with this version of the diff.

Without any diff I observed that only 20 to 25% of the sched_yield()
calls made by firefox result in a different thread being chosen by
mi_switch().  Somethings tell me that per-CPU runqueues are to blame
here.

> Index: syscalls.master
> ===================================================================
> RCS file: /cvs/src/sys/kern/syscalls.master,v
> retrieving revision 1.167
> diff -u -p -r1.167 syscalls.master
> --- syscalls.master   21 Mar 2016 22:41:29 -0000      1.167
> +++ syscalls.master   23 Mar 2016 20:33:45 -0000
> @@ -514,7 +514,7 @@
>  #else
>  297  UNIMPL
>  #endif
> -298  STD NOLOCK      { int sys_sched_yield(void); }
> +298  STD             { int sys_sched_yield(void); }
>  299  STD NOLOCK      { pid_t sys_getthrid(void); }
>  300  OBSOL           t32___thrsleep
>  301  STD             { int sys___thrwakeup(const volatile void *ident, \
> Index: kern_synch.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_synch.c,v
> retrieving revision 1.129
> diff -u -p -r1.129 kern_synch.c
> --- kern_synch.c      9 Mar 2016 13:38:50 -0000       1.129
> +++ kern_synch.c      23 Mar 2016 20:33:45 -0000
> @@ -1,4 +1,4 @@
> -/*   $OpenBSD: kern_synch.c,v 1.129 2016/03/09 13:38:50 mpi Exp $    */
> +/*   $openbsd: kern_synch.c,v 1.129 2016/03/09 13:38:50 mpi Exp $    */
>  /*   $NetBSD: kern_synch.c,v 1.37 1996/04/22 01:38:37 christos Exp $ */
>  
>  /*
> @@ -432,7 +432,24 @@ wakeup(const volatile void *chan)
>  int
>  sys_sched_yield(struct proc *p, void *v, register_t *retval)
>  {
> -     yield();
> +     struct proc *q;
> +     int s;
> +
> +     SCHED_LOCK(s);
> +     /*
> +      * If one of the threads of a multi-threaded process called
> +      * sched_yield(2), drop its priority to ensure its siblings
> +      * can make some progress.
> +      */
> +     p->p_priority = p->p_usrpri;
> +     TAILQ_FOREACH(q, &p->p_p->ps_threads, p_thr_link)
> +             p->p_priority = max(p->p_priority, q->p_priority);
> +     p->p_stat = SRUN;
> +     setrunqueue(p);
> +     p->p_ru.ru_nvcsw++;
> +     mi_switch();
> +     SCHED_UNLOCK(s);
> +
>       return (0);
>  }
>  
>

Re: Scheduler hack for multi-threaded processes

Reply via email to