On Tue, May 04, 2021 at 11:21:27PM +1000, David Gwynne wrote:
> On Tue, May 04, 2021 at 11:24:05AM +0200, Martin Pieuchot wrote:
> > On 04/05/21(Tue) 01:10, Scott Cheloha wrote:
> > > [...] 
> > > I want to run softclock() without the kernel lock.  The way to go, I
> > > think, is to first push the kernel lock down into timeout_run(), and
> > > then to remove the kernel lock from each timeout, one by one.
> > 
> > Grabbing and releasing the KERNEL_LOCk() on a per-timeout basis creates
> > more latency than running all timeouts in a batch after having grabbed
> > the KERNEL_LOCK().  I doubt this is the best way forward.
> > 
> > > Before we can push the kernel lock down into timeout_run() we need to
> > > remove the kernel lock from timeout_del_barrier(9).
> > 
> > Seems worth it on its own.
> >
> > > The kernel lock is used in timeout_del_barrier(9) to determine whether
> > > the given timeout has stopped running.  Because the softclock() runs
> > > with the kernel lock we currently assume that once the calling thread
> > > has taken the kernel lock any onging softclock() must have returned
> > > and relinquished the lock, so the timeout in question has returned.
> 
> as i'll try to explain below, it's not about waiting for a specific
> timeout, it's about knowing that a currently running timeout has
> finished.

I'm very confused about the distinction between the two.

> > So you want to stop using the KERNEL_LOCK() to do the serialization?  
> 
> the KERNEL_LOCK in timeout_barrier cos it was an elegant^Wclever^W hack.
> because timeouts are run under the kernel lock, i knew i could take the
> lock and know that timeouts arent running anymore. that's all.
> 
> in hindsight this means that the thread calling timeout_barrier spins
> when it could sleep, and worse, it spins waiting for all pending
> timeouts to run. so yeah, i agree that undoing the hack is worth it on
> its own.

Okay so we're all in agreement that this could be improved, cool.

> > > The simplest replacement I can think of is a volatile pointer to the
> > > running timeout that we set before leaving the timeout_mutex and clear
> > > after reentering the same during timeout_run().
> > 
> > Sounds like a condition variable protected by this mutex.  Interesting
> > that cond_wait(9) doesn't work with a mutex. 
> 
> cond_wait borrows the sched lock to coordinate between the waiting
> thread and the signalling context. the cond api is basically a wrapper
> around sleep_setup/sleep_finish.
> 
> > > So in the non-TIMEOUT_PROC case the timeout_del_barrier(9) caller just
> > > spins until the timeout function returns and the timeout_running
> > > pointer is changed.  Not every caller can sleep during
> > > timeout_del_barrier(9).  I think spinning is the simplest thing that
> > > will definitely work here.
> > 
> > This keeps the current semantic indeed.
> 
> i don't want to throw timeout_barrier out just yet.

I'm fine with that... where might it be useful?

> > > -void
> > > -timeout_barrier(struct timeout *to)
> > > +int
> > > +timeout_del_barrier(struct timeout *to)
> > >  {
> > > + struct timeout barrier;
> > > + struct cond c = COND_INITIALIZER();
> > >   int needsproc = ISSET(to->to_flags, TIMEOUT_PROC);
> > >  
> > >   timeout_sync_order(needsproc);
> > >  
> > > - if (!needsproc) {
> > > -         KERNEL_LOCK();
> > > -         splx(splsoftclock());
> > > -         KERNEL_UNLOCK();
> > > - } else {
> > > -         struct cond c = COND_INITIALIZER();
> > > -         struct timeout barrier;
> > > + mtx_enter(&timeout_mutex);
> > > +
> > > + if (timeout_del_locked(to)) {
> > > +         mtx_leave(&timeout_mutex);
> > > +         return 1;
> > > + }
> > >  
> > > + if (needsproc) {
> > >           timeout_set_proc(&barrier, timeout_proc_barrier, &c);
> > >           barrier.to_process = curproc->p_p;
> > > -
> > > -         mtx_enter(&timeout_mutex);
> > >           SET(barrier.to_flags, TIMEOUT_ONQUEUE);
> > >           CIRCQ_INSERT_TAIL(&timeout_proc, &barrier.to_list);
> > >           mtx_leave(&timeout_mutex);
> > > -
> > >           wakeup_one(&timeout_proc);
> > > -
> > >           cond_wait(&c, "tmobar");
> > > + } else {
> > > +         mtx_leave(&timeout_mutex);
> > > +         /* XXX Is this in the right spot? */
> > > +         splx(splsoftclock());
> > > +         while (timeout_running == to)
> > > +                 CPU_BUSY_CYCLE();
> > 
> > Won't splx() will execute the soft-interrupt if there's any pending?
> > Shouldn't the barrier be before?  Could you add `spc->spc_spinning++'
> > around the spinning loop?  What happen if two threads call
> > timeout_del_barrier(9) with the same argument at the same time?  Is
> > it possible and/or supported?
> 
> the timeout passed to timeout_barrier is only used to figure out which
> context the barrier should apply to, ie, it's used to pick between the
> softint or proc queue runners. like the other barriers in other parts of
> the kernel, it's just supposed to wait for the current work to finish
> running.

Here is where I get confused.

Why do I want to wait for *all* timeouts in the queue to finish
running?

My understanding of the timeout_del_barrier(9) use case was as
follows:

1. I have a dynamically allocated timeout struct.  The timeout is
   scheduled to execute at some point in the future.

2. I want to free the memory allocated to for the timeout.

3. To safely free the memory I need to ensure the timeout
   is not executing.  Until the timeout function, i.e.
   to->to_func(), has returned it isn't necessarily safe to
   free that memory.

4. If I call timeout_del_barrier(9), it will only return if the
   timeout in question is not running.  Assuming the timeout itself
   is not a periodic timeout that reschedules itself we can then
   safely free the memory.

Given this understanding, my approach was to focus on the timeout in
question.  So my code just spins until it the timeout is no longer
running.

But now I think I am mistaken.

IIUC you're telling me (and showing me, in code) that the goal of
timeout_barrier() is to wait for the *whole* softclock() to return,
not just the one timeout.

Why do I want to wait for the whole softclock() or softclock_thread()?
Why not just wait for the one timeout to finish?

> it is unfortunate that the timeout_barrier man page isn't very
> clear. it's worth reading the manpage for intr_barrier and
> taskq_barrier. sched_barrier is a thing that exists too, but isn't
> documented. also have a look at the EXAMPLE in the cond_wait(9) manpage
> too. however, don't read the taskq_barrier code. timeout_barrier
> is like intr_barrier in that it uses the argument to find out where
> work runs, but again, it doesn't care about the specific handler
> itself. we could have a timeout_barrier(void) and a
> timeout_barrier_proc(void) instead.

Ack.

> the point im trying to make here is that timeout_barrier isn't a
> snowflake that exists in a vacuum, and should have similar semantics to
> barriers in the other places in the kernel.

Also ack.

> something like this should work. i haven't kicked it around much
> though, so maybe it doesn't. it would be nice if there was a
> CIRQ_INSERT_HEAD to use, but that would be a separate change.
> 
> Index: kern_timeout.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_timeout.c,v
> retrieving revision 1.83
> diff -u -p -r1.83 kern_timeout.c
> --- kern_timeout.c    8 Feb 2021 08:18:45 -0000       1.83
> +++ kern_timeout.c    4 May 2021 11:49:46 -0000
> @@ -175,7 +175,7 @@ void softclock_thread(void *);
>  uint32_t timeout_bucket(const struct timeout *);
>  uint32_t timeout_maskwheel(uint32_t, const struct timespec *);
>  void timeout_run(struct timeout *);
> -void timeout_proc_barrier(void *);
> +void timeout_barrier_signal(void *);
>  
>  /*
>   * The first thing in a struct timeout is its struct circq, so we
> @@ -490,34 +490,33 @@ timeout_del_barrier(struct timeout *to)
>  void
>  timeout_barrier(struct timeout *to)
>  {
> -     int needsproc = ISSET(to->to_flags, TIMEOUT_PROC);
> +     int proc = ISSET(to->to_flags, TIMEOUT_PROC);
> +     struct cond c = COND_INITIALIZER();
> +     struct timeout barrier;
>  
> -     timeout_sync_order(needsproc);
> -
> -     if (!needsproc) {
> -             KERNEL_LOCK();
> -             splx(splsoftclock());
> -             KERNEL_UNLOCK();
> -     } else {
> -             struct cond c = COND_INITIALIZER();
> -             struct timeout barrier;
> +     timeout_sync_order(proc);
>  
> -             timeout_set_proc(&barrier, timeout_proc_barrier, &c);
> -             barrier.to_process = curproc->p_p;
> +     _timeout_set(&barrier, timeout_barrier_signal, &c, proc, KCLOCK_NONE);
> +#if NKCOV > 0
> +     barrier.to_process = curproc->p_p;
> +#endif
>  
> -             mtx_enter(&timeout_mutex);
> -             SET(barrier.to_flags, TIMEOUT_ONQUEUE);
> -             CIRCQ_INSERT_TAIL(&timeout_proc, &barrier.to_list);
> -             mtx_leave(&timeout_mutex);
> +     mtx_enter(&timeout_mutex);
> +     SET(barrier.to_flags, TIMEOUT_ONQUEUE);
> +     CIRCQ_INSERT_TAIL(proc ? &timeout_proc : &timeout_todo,
> +         &barrier.to_list);
> +     mtx_leave(&timeout_mutex);
>  
> +     if (proc)
>               wakeup_one(&timeout_proc);
> +     else
> +             softintr_schedule(softclock_si);
>  
> -             cond_wait(&c, "tmobar");
> -     }
> +     cond_wait(&c, "tmobar");
>  }
>  
>  void
> -timeout_proc_barrier(void *arg)
> +timeout_barrier_signal(void *arg)
>  {
>       struct cond *c = arg;
>  
> 

Reply via email to