On Thu, May 06, 2021 at 11:43:55AM -0500, Scott Cheloha wrote:
> On Wed, May 05, 2021 at 11:05:06AM +1000, David Gwynne wrote:
> > On Tue, May 04, 2021 at 11:54:31AM -0500, Scott Cheloha wrote:
> > > 
> > > [...]
> > > 
> > > Here is where I get confused.
> > > 
> > > Why do I want to wait for *all* timeouts in the queue to finish
> > > running?
> > 
> > You don't, you just want to know that whatever timeout was running
> > has finished cos you don't know if that currently running timeout is
> > yours or not.
> > 
> > > My understanding of the timeout_del_barrier(9) use case was as
> > > follows:
> > > 
> > > 1. I have a dynamically allocated timeout struct.  The timeout is
> > >    scheduled to execute at some point in the future.
> > > 
> > > 2. I want to free the memory allocated to for the timeout.
> > > 
> > > 3. To safely free the memory I need to ensure the timeout
> > >    is not executing.  Until the timeout function, i.e.
> > >    to->to_func(), has returned it isn't necessarily safe to
> > >    free that memory.
> > > 
> > > 4. If I call timeout_del_barrier(9), it will only return if the
> > >    timeout in question is not running.  Assuming the timeout itself
> > >    is not a periodic timeout that reschedules itself we can then
> > >    safely free the memory.
> > 
> > Barriers aren't about references to timeouts, they're about references
> > to the work associated with a timeout.
> > 
> > If you only cared about the timeout struct itself, then you can get
> > all the information about whether it's referenced by the timeout
> > queues from the return values from timeout_add and timeout_del.
> > timeout_add() returns 1 if the timeout subsystem takes the reference
> > to the timeout. If the timeout is already scheduled it returns 0
> > because it's already got a reference. timeout_del returns 1 if the
> > timeout itself was scheduled and it removed it, therefore giving
> > up it's reference. If you timeout_del and it returns 0, then it
> > wasn't on a queue inside timeouts. Easy.
> > 
> > What timeout_add and timeout_del don't tell you is whether the work
> > referenced by the timeout is currently running. The timeout runners
> > copy the function and argument onto the stack when they dequeue a
> > timeout to run, and give up the reference to the timeout when the
> > mutex around the timeout wheels/cirqs is released. However, the argument
> > is still referenced on the stack and the function to process it may be
> > about to run or is running. You can't tell that from the timeout_add/del
> > return values though.
> > 
> > We provide two ways to deal with that. One is you have reference
> > counters (or similar) on the thing you're deferring to a timeout.
> > The other is you use a barrier so you know the work you deferred
> > isn't on the timeout runners stack anymore because it's moved on
> > to run the barrier work.
> > 
> > This is consistent with tasks and interrupts.
> > 
> > > Given this understanding, my approach was to focus on the timeout in
> > > question.  So my code just spins until it the timeout is no longer
> > > running.
> > > 
> > > But now I think I am mistaken.
> > > 
> > > IIUC you're telling me (and showing me, in code) that the goal of
> > > timeout_barrier() is to wait for the *whole* softclock() to return,
> > > not just the one timeout.
> > 
> > No, just the currently running timeout.
> > 
> > Putting the barrier timeout on the end of the proc and todo cirqs
> > is because there's no CIRCQ_INSERT_HEAD I can use right now.
> > 
> > > Why do I want to wait for the whole softclock() or softclock_thread()?
> > > Why not just wait for the one timeout to finish?
> > 
> > Cos I was too lazy to write CIRCQ_INSERT_HEAD last night :D
> 
> Okay.  After thinking this over I'm pretty sure we are skinning the
> same cat in two different ways.
> 
> Your cond_wait(9) approach is fine for both timeout types because
> timeout_del_barrier(9) is only safe to call from process context.
> When I wrote my first diff I was under the impression it was safe to
> call timeout_del_barrier(9) from interrupt context.  But I reread the
> manpage and now I see that this is not the case, my bad.
> 
> > [...]
> > 
> > This discussion has some relevance to taskqs too. I was also lazy when I
> > implemented taskq_barrier and used task_add to put the barrier onto the
> > list of work, which meant it got added to the end. Now DRM relies on
> > this behaviour. Maybe I should have pushed to commit the diff below.
> > 
> > This diff lets taskq barriers run as soon as possible, but adds compat
> > to the drm workqueue stuff so they can wait for all pending work to
> > finish as they expect.
> > 
> > P.S. don't call timeout_del from inside a timeout handler. I still think
> > timeout_set_proc was a mistake.
> > 
> > [...]
> 
> Let's pick this whole train of thought (taskq, etc.) up in a
> different thread.  One thing at a time.
> 
> --
> 
> Updated patch:
> 
> - Keep timeout_barrier(9).
> 
> - In timeout_barrier(9) use the barrier timeout for both normal and
>   process-context timeouts.  All callers use cond_wait(9) now.
> 
> - Remove the kernel lock from the non-process path.  I assume I don't
>   need to splx(splschedclock()) anymore?
> 
> - Rename "timeout_proc_barrier()" to "timeout_barrier_timeout()"
>   because now we use it for both timeout types.
> 
> - Update timeout.9: timeout_barrier(9) no longer takes the kernel
>   lock.
> 
> Am I on the right track?  Aside from dropping the kernel lock there is
> no behavior change here.
> 
> We can talk about inserting the barrier timeouts at the head of the
> list to reduce latency in a different thread after we're done removing
> the kernel lock here.  Don't want to combine a behavior change with a
> locking change in the same revision.

yep.

ok by me, but be warned that i'm very distracted at the moment.

> Index: sys/kern/kern_timeout.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_timeout.c,v
> retrieving revision 1.83
> diff -u -p -r1.83 kern_timeout.c
> --- sys/kern/kern_timeout.c   8 Feb 2021 08:18:45 -0000       1.83
> +++ sys/kern/kern_timeout.c   6 May 2021 16:39:56 -0000
> @@ -172,10 +172,10 @@ void softclock_create_thread(void *);
>  void softclock_process_kclock_timeout(struct timeout *, int);
>  void softclock_process_tick_timeout(struct timeout *, int);
>  void softclock_thread(void *);
> +void timeout_barrier_timeout(void *);
>  uint32_t timeout_bucket(const struct timeout *);
>  uint32_t timeout_maskwheel(uint32_t, const struct timespec *);
>  void timeout_run(struct timeout *);
> -void timeout_proc_barrier(void *);
>  
>  /*
>   * The first thing in a struct timeout is its struct circq, so we
> @@ -490,34 +490,38 @@ timeout_del_barrier(struct timeout *to)
>  void
>  timeout_barrier(struct timeout *to)
>  {
> -     int needsproc = ISSET(to->to_flags, TIMEOUT_PROC);
> +     struct timeout barrier;
> +     struct cond c;
> +     int procflag;
> +
> +     procflag = (to->to_flags & TIMEOUT_PROC);
> +     timeout_sync_order(procflag);
> +
> +     timeout_set_flags(&barrier, timeout_barrier_timeout, &c, procflag);
> +     barrier.to_process = curproc->p_p;
> +     cond_init(&c);
>  
> -     timeout_sync_order(needsproc);
> -
> -     if (!needsproc) {
> -             KERNEL_LOCK();
> -             splx(splsoftclock());
> -             KERNEL_UNLOCK();
> -     } else {
> -             struct cond c = COND_INITIALIZER();
> -             struct timeout barrier;
> -
> -             timeout_set_proc(&barrier, timeout_proc_barrier, &c);
> -             barrier.to_process = curproc->p_p;
> +     mtx_enter(&timeout_mutex);
>  
> -             mtx_enter(&timeout_mutex);
> -             SET(barrier.to_flags, TIMEOUT_ONQUEUE);
> +     barrier.to_time = ticks;
> +     SET(barrier.to_flags, TIMEOUT_ONQUEUE);
> +     if (procflag)
>               CIRCQ_INSERT_TAIL(&timeout_proc, &barrier.to_list);
> -             mtx_leave(&timeout_mutex);
> +     else
> +             CIRCQ_INSERT_TAIL(&timeout_todo, &barrier.to_list);
> +
> +     mtx_leave(&timeout_mutex);
>  
> +     if (procflag)
>               wakeup_one(&timeout_proc);
> +     else
> +             softintr_schedule(softclock_si);
>  
> -             cond_wait(&c, "tmobar");
> -     }
> +     cond_wait(&c, "tmobar");
>  }
>  
>  void
> -timeout_proc_barrier(void *arg)
> +timeout_barrier_timeout(void *arg)
>  {
>       struct cond *c = arg;
>  
> Index: share/man/man9/timeout.9
> ===================================================================
> RCS file: /cvs/src/share/man/man9/timeout.9,v
> retrieving revision 1.52
> diff -u -p -r1.52 timeout.9
> --- share/man/man9/timeout.9  26 Apr 2021 20:32:30 -0000      1.52
> +++ share/man/man9/timeout.9  6 May 2021 16:39:56 -0000
> @@ -194,11 +194,6 @@ but it will wait until any current execu
>  ensures that any current execution of the timeout in the argument
>  .Fa to
>  has completed before returning.
> -If the timeout
> -.Fa to
> -has been initialised with
> -.Fn timeout_set
> -it will take the kernel lock.
>  .Pp
>  The
>  .Fn timeout_pending

Reply via email to