Re: AIO race condition

John Plevyak Sat, 10 Sep 2011 13:18:30 -0700

This is a race condition which should happen very very infrequently (e.g.
once a day
on a loaded system perhaps) and it would only be 10msec on an unloaded
system, which would
make it very very very infrequent (maybe once a year in that case).  I agree
that 10 msec
is long these days, but unfortunately unix-ish systems cannot be counted on
to not
busy spin when delaying less than 10 msecs.  So while I agree with you in
principle,
in practice 10 msec is probably a safer.


john

On Sat, Sep 10, 2011 at 12:38 PM, Theo Schlossnagle <je...@omniti.com>wrote:

> People don't deploy spinning disks much anymore. 10ms seems high.
> <<1ms for SSDs.  Perhaps we should optimize for that instead?
>
> On Sat, Sep 10, 2011 at 3:13 PM, John Plevyak <jplev...@acm.org> wrote:
> > You are right.  My preference would be to change this to a
> > pthread_cond_timedwait
> > with a 10 msec timeout (or somesuch).  The rational being that (hard)
> disk
> > latency
> > is in that range in any case and the chance of this happening is rare so
> > taking
> > a 10 msec hit would not be the end of the world.
> >
> > The other rational is that it is a minimally invasive change.
> >
> > What do you think Bart?
> >
> > john
> >
> >
> > On Wed, Sep 7, 2011 at 7:34 AM, Bart Wyatt <wanderingb...@yooser.com>
> wrote:
> >
> >> I think I have identified a race condition that can erroneously place
> >> a new AIO request on the "temp" list without waking up a thread to
> >> service it.  It seems that in most cases of this race condition the
> >> next request will rectify the issue, however in cases such as cache
> >> volume initialization/recovery there are no additional requests issued
> >> and the initialization soft locks itself.
> >>
> >> the problem stems from the handling of the temp list itself.  The
> >> servicing loop checks the temp list as such:
> >>
> >> ink_mutex_acquire(&my_aio_req->aio_mutex);
> >>   for (;;) {
> >>     do {
> >>       current_req = my_aio_req;
> >>       /* check if any pending requests on the atomic list */
> >> A>>>  if (!INK_ATOMICLIST_EMPTY(my_aio_req->aio_temp_list))
> >>         aio_move(my_aio_req);
> >>       if (!(op = my_aio_req->aio_todo.pop()) && !(op =
> >> my_aio_req->http_aio_todo.pop()))
> >> B>>>    break;
> >>       <<blah blah blah, do the servicing>>
> >>     } while (1);
> >> C>>>ink_cond_wait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex);
> >>   }
> >>
> >> The thread holds the aio_mutex and checks to see if the atomiclist is
> >> empty, however in the request queuing code writing to the atomic list
> >> happens outside of the mutex.  The intent is probably to provide a
> >> faster request enqueue when the lock contention is high:
> >>
> >>   if (!ink_mutex_try_acquire(&req->aio_mutex)) {
> >> D>>>ink_atomiclist_push(&req->aio_temp_list, op);
> >>   } else {
> >>     /* check if any pending requests on the atomic list */
> >>     if (!INK_ATOMICLIST_EMPTY(req->aio_temp_list))
> >>       aio_move(req);
> >>     /* now put the new request */
> >>     aio_insert(op, req);
> >>     ink_cond_signal(&req->aio_cond);
> >>     ink_mutex_release(&req->aio_mutex);
> >>   }
> >>
> >> When the servicing threads have no jobs, any requests atomically
> >> enqueued ("D") by another thread after "A" but before "C" will _not_
> >> get moved to the working queues and will _not_ signal the aio_cond.
> >> If N-1 of the cache disk AIO threads are waiting for a condition
> >> signal and the remaining service thread is in that "danger zone" when
> >> the initial read of the volume header is enqueued, it will end up on
> >> the temp list and never be serviced.
> >>
> >> In normal operation, the next request to acquire the mutex will move
> >> the requests from the temp queue to the working queues.  This would
> >> potentially cause a servicing delay, but not a soft lock as long as
> >> there is a steady stream of requests.
> >>
> >> I can implement a dirty fix for my current problem (soft lock on cache
> >> initialization every now and again).  However, in order to implement a
> >> real fix I would need a better grasp on the requirements of the AIO
> >> system.  For instance, are their typically far fewer request producer
> >> threads than consumer threads (where is lock contention the most
> >> troublesome)? Also, It seems that the working queues are not atomic as
> >> they need to respect priority, however only the cluster code ever sets
> >> the priority to something non default.
> >>
> >> If priorities can be bucketed and the model is 1/few producers and
> >> many consumers then it seems like the better choice is to implement a
> >> mutex that guards the enqueue to a set of atomic queues.  Dequeues can
> >> run lockless until the queues are empty in which case they would have
> >> to lock in order to guarantee that the queues are exhausted and the
> >> signal is handled correctly.  Low producer counts reduce the lock
> >> contention on enqueue and empty queues tend to be synonymous with low
> >> performance demands, so the lock should not be a big deal in that way.
> >>
> >> -Bart
> >>
> >
>
>
>
> --
> Theo Schlossnagle
>
> http://omniti.com/is/theo-schlossnagle
>

Re: AIO race condition

Reply via email to