I think I have identified a race condition that can erroneously place
a new AIO request on the "temp" list without waking up a thread to
service it.  It seems that in most cases of this race condition the
next request will rectify the issue, however in cases such as cache
volume initialization/recovery there are no additional requests issued
and the initialization soft locks itself.

the problem stems from the handling of the temp list itself.  The
servicing loop checks the temp list as such:

ink_mutex_acquire(&my_aio_req->aio_mutex);
  for (;;) {
    do {
      current_req = my_aio_req;
      /* check if any pending requests on the atomic list */
A>>>  if (!INK_ATOMICLIST_EMPTY(my_aio_req->aio_temp_list))
        aio_move(my_aio_req);
      if (!(op = my_aio_req->aio_todo.pop()) && !(op =
my_aio_req->http_aio_todo.pop()))
B>>>    break;
      <<blah blah blah, do the servicing>>
    } while (1);
C>>>ink_cond_wait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex);
  }

The thread holds the aio_mutex and checks to see if the atomiclist is
empty, however in the request queuing code writing to the atomic list
happens outside of the mutex.  The intent is probably to provide a
faster request enqueue when the lock contention is high:

  if (!ink_mutex_try_acquire(&req->aio_mutex)) {
D>>>ink_atomiclist_push(&req->aio_temp_list, op);
  } else {
    /* check if any pending requests on the atomic list */
    if (!INK_ATOMICLIST_EMPTY(req->aio_temp_list))
      aio_move(req);
    /* now put the new request */
    aio_insert(op, req);
    ink_cond_signal(&req->aio_cond);
    ink_mutex_release(&req->aio_mutex);
  }

When the servicing threads have no jobs, any requests atomically
enqueued ("D") by another thread after "A" but before "C" will _not_
get moved to the working queues and will _not_ signal the aio_cond.
If N-1 of the cache disk AIO threads are waiting for a condition
signal and the remaining service thread is in that "danger zone" when
the initial read of the volume header is enqueued, it will end up on
the temp list and never be serviced.

In normal operation, the next request to acquire the mutex will move
the requests from the temp queue to the working queues.  This would
potentially cause a servicing delay, but not a soft lock as long as
there is a steady stream of requests.

I can implement a dirty fix for my current problem (soft lock on cache
initialization every now and again).  However, in order to implement a
real fix I would need a better grasp on the requirements of the AIO
system.  For instance, are their typically far fewer request producer
threads than consumer threads (where is lock contention the most
troublesome)? Also, It seems that the working queues are not atomic as
they need to respect priority, however only the cluster code ever sets
the priority to something non default.

If priorities can be bucketed and the model is 1/few producers and
many consumers then it seems like the better choice is to implement a
mutex that guards the enqueue to a set of atomic queues.  Dequeues can
run lockless until the queues are empty in which case they would have
to lock in order to guarantee that the queues are exhausted and the
signal is handled correctly.  Low producer counts reduce the lock
contention on enqueue and empty queues tend to be synonymous with low
performance demands, so the lock should not be a big deal in that way.

-Bart

Reply via email to