I think I have identified a race condition that can erroneously place a new AIO request on the "temp" list without waking up a thread to service it. It seems that in most cases of this race condition the next request will rectify the issue, however in cases such as cache volume initialization/recovery there are no additional requests issued and the initialization soft locks itself.
the problem stems from the handling of the temp list itself. The servicing loop checks the temp list as such: ink_mutex_acquire(&my_aio_req->aio_mutex); for (;;) { do { current_req = my_aio_req; /* check if any pending requests on the atomic list */ A>>> if (!INK_ATOMICLIST_EMPTY(my_aio_req->aio_temp_list)) aio_move(my_aio_req); if (!(op = my_aio_req->aio_todo.pop()) && !(op = my_aio_req->http_aio_todo.pop())) B>>> break; <<blah blah blah, do the servicing>> } while (1); C>>>ink_cond_wait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex); } The thread holds the aio_mutex and checks to see if the atomiclist is empty, however in the request queuing code writing to the atomic list happens outside of the mutex. The intent is probably to provide a faster request enqueue when the lock contention is high: if (!ink_mutex_try_acquire(&req->aio_mutex)) { D>>>ink_atomiclist_push(&req->aio_temp_list, op); } else { /* check if any pending requests on the atomic list */ if (!INK_ATOMICLIST_EMPTY(req->aio_temp_list)) aio_move(req); /* now put the new request */ aio_insert(op, req); ink_cond_signal(&req->aio_cond); ink_mutex_release(&req->aio_mutex); } When the servicing threads have no jobs, any requests atomically enqueued ("D") by another thread after "A" but before "C" will _not_ get moved to the working queues and will _not_ signal the aio_cond. If N-1 of the cache disk AIO threads are waiting for a condition signal and the remaining service thread is in that "danger zone" when the initial read of the volume header is enqueued, it will end up on the temp list and never be serviced. In normal operation, the next request to acquire the mutex will move the requests from the temp queue to the working queues. This would potentially cause a servicing delay, but not a soft lock as long as there is a steady stream of requests. I can implement a dirty fix for my current problem (soft lock on cache initialization every now and again). However, in order to implement a real fix I would need a better grasp on the requirements of the AIO system. For instance, are their typically far fewer request producer threads than consumer threads (where is lock contention the most troublesome)? Also, It seems that the working queues are not atomic as they need to respect priority, however only the cluster code ever sets the priority to something non default. If priorities can be bucketed and the model is 1/few producers and many consumers then it seems like the better choice is to implement a mutex that guards the enqueue to a set of atomic queues. Dequeues can run lockless until the queues are empty in which case they would have to lock in order to guarantee that the queues are exhausted and the signal is handled correctly. Low producer counts reduce the lock contention on enqueue and empty queues tend to be synonymous with low performance demands, so the lock should not be a big deal in that way. -Bart