On 23/03/16(Wed) 21:35, Mark Kettenis wrote:
> > Date: Mon, 21 Mar 2016 16:51:16 +0100 (CET)
> > From: Mark Kettenis <[email protected]>
> >
> > > Date: Sat, 19 Mar 2016 13:53:07 +0100
> > > From: Martin Pieuchot <[email protected]>
> > >
> > > Applications using multiple threads often call sched_yield(2) to
> > > indicate that one of the threads cannot make any progress because
> > > it is waiting for a resource held by another one.
> > >
> > > One example of this scenario is the _spinlock() implementation of
> > > our librthread. But if you look on https://codesearch.debian.net
> > > you can find much more use cases, notably MySQL, PostgreSQL, JDK,
> > > libreoffice, etc.
> > >
> > > Now the problem with our current scheduler is that the priority of
> > > a thread decreases when it is the "curproc" of a CPU. So the threads
> > > that don't run and sched_yield(2) end up having a higher priority than
> > > the thread holding the resource. Which means that it's really hard for
> > > such multi-threaded applications to make progress, resulting in a lot of
> > > IPIs numbers.
> > > That'd also explain why if you have a more CPUs, let's say 4 instead
> > > of 2, your application will more likely make some progress and you'll
> > > see less sluttering/freezing.
> > >
> > > So what the diff below does is that it penalizes the threads from
> > > multi-threaded applications such that progress can be made. It is
> > > inspired from the recent scheduler work done by Michal Mazurek on
> > > tech@.
> > >
> > > I experimented with various values for "p_priority" and this one is
> > > the one that generates fewer # IPIs when watching a HD video on firefox.
> > > Because yes, with this diff, now I can.
> > >
> > > I'd like to know if dereferencing ``p_p'' is safe without holding the
> > > KERNEL_LOCK.
> > >
> > > I'm also interested in hearing from more people using multi-threaded
> > > applications.
> >
> > This doesn't only change the sched_yield() behaviour, but also
> > modifies how in-kernel yield() calls behave for threaded processes.
> > That is probably not right.
>
> So here is a diff that keeps yield() the same and adds the code in the
> sched_yield(2) implementation instead.
I'm not a big fan of code duplication, what about checking for
p_usrpri >= PUSER instead?
> It also changes the logic that
> picks the priority to walk the complete threads listand pick the
> highest priotity of all the threads. That should be enough to make
> sure the calling thread is scheduled behind all other threads from the
> same process. That does require us to grab the kernel lock though.
> So this removes NOLOCK from sched_yield(2). I don't think that is a
> big issue.
>
> Still improves video playback in firefox on my x1.
This is another kind of hack :) It's certainly less intrusive but
I'm still not sure if we want that. Reducing the priority of a
thread to the worst priority of its siblings might still not be
enough.
Now I'd like to know how many times sched_yield() really triggers a
context switch for threaded programs with this version of the diff.
Without any diff I observed that only 20 to 25% of the sched_yield()
calls made by firefox result in a different thread being chosen by
mi_switch(). Somethings tell me that per-CPU runqueues are to blame
here.
> Index: syscalls.master
> ===================================================================
> RCS file: /cvs/src/sys/kern/syscalls.master,v
> retrieving revision 1.167
> diff -u -p -r1.167 syscalls.master
> --- syscalls.master 21 Mar 2016 22:41:29 -0000 1.167
> +++ syscalls.master 23 Mar 2016 20:33:45 -0000
> @@ -514,7 +514,7 @@
> #else
> 297 UNIMPL
> #endif
> -298 STD NOLOCK { int sys_sched_yield(void); }
> +298 STD { int sys_sched_yield(void); }
> 299 STD NOLOCK { pid_t sys_getthrid(void); }
> 300 OBSOL t32___thrsleep
> 301 STD { int sys___thrwakeup(const volatile void *ident, \
> Index: kern_synch.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_synch.c,v
> retrieving revision 1.129
> diff -u -p -r1.129 kern_synch.c
> --- kern_synch.c 9 Mar 2016 13:38:50 -0000 1.129
> +++ kern_synch.c 23 Mar 2016 20:33:45 -0000
> @@ -1,4 +1,4 @@
> -/* $OpenBSD: kern_synch.c,v 1.129 2016/03/09 13:38:50 mpi Exp $ */
> +/* $openbsd: kern_synch.c,v 1.129 2016/03/09 13:38:50 mpi Exp $ */
> /* $NetBSD: kern_synch.c,v 1.37 1996/04/22 01:38:37 christos Exp $ */
>
> /*
> @@ -432,7 +432,24 @@ wakeup(const volatile void *chan)
> int
> sys_sched_yield(struct proc *p, void *v, register_t *retval)
> {
> - yield();
> + struct proc *q;
> + int s;
> +
> + SCHED_LOCK(s);
> + /*
> + * If one of the threads of a multi-threaded process called
> + * sched_yield(2), drop its priority to ensure its siblings
> + * can make some progress.
> + */
> + p->p_priority = p->p_usrpri;
> + TAILQ_FOREACH(q, &p->p_p->ps_threads, p_thr_link)
> + p->p_priority = max(p->p_priority, q->p_priority);
> + p->p_stat = SRUN;
> + setrunqueue(p);
> + p->p_ru.ru_nvcsw++;
> + mi_switch();
> + SCHED_UNLOCK(s);
> +
> return (0);
> }
>
>