Re: AIO race condition

Theo Schlossnagle Sat, 10 Sep 2011 12:38:43 -0700

People don't deploy spinning disks much anymore. 10ms seems high.
<<1ms for SSDs.  Perhaps we should optimize for that instead?


On Sat, Sep 10, 2011 at 3:13 PM, John Plevyak <[email protected]> wrote:
> You are right.  My preference would be to change this to a
> pthread_cond_timedwait
> with a 10 msec timeout (or somesuch).  The rational being that (hard) disk
> latency
> is in that range in any case and the chance of this happening is rare so
> taking
> a 10 msec hit would not be the end of the world.
>
> The other rational is that it is a minimally invasive change.
>
> What do you think Bart?
>
> john
>
>
> On Wed, Sep 7, 2011 at 7:34 AM, Bart Wyatt <[email protected]> wrote:
>
>> I think I have identified a race condition that can erroneously place
>> a new AIO request on the "temp" list without waking up a thread to
>> service it.  It seems that in most cases of this race condition the
>> next request will rectify the issue, however in cases such as cache
>> volume initialization/recovery there are no additional requests issued
>> and the initialization soft locks itself.
>>
>> the problem stems from the handling of the temp list itself.  The
>> servicing loop checks the temp list as such:
>>
>> ink_mutex_acquire(&my_aio_req->aio_mutex);
>>   for (;;) {
>>     do {
>>       current_req = my_aio_req;
>>       /* check if any pending requests on the atomic list */
>> A>>>  if (!INK_ATOMICLIST_EMPTY(my_aio_req->aio_temp_list))
>>         aio_move(my_aio_req);
>>       if (!(op = my_aio_req->aio_todo.pop()) && !(op =
>> my_aio_req->http_aio_todo.pop()))
>> B>>>    break;
>>       <<blah blah blah, do the servicing>>
>>     } while (1);
>> C>>>ink_cond_wait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex);
>>   }
>>
>> The thread holds the aio_mutex and checks to see if the atomiclist is
>> empty, however in the request queuing code writing to the atomic list
>> happens outside of the mutex.  The intent is probably to provide a
>> faster request enqueue when the lock contention is high:
>>
>>   if (!ink_mutex_try_acquire(&req->aio_mutex)) {
>> D>>>ink_atomiclist_push(&req->aio_temp_list, op);
>>   } else {
>>     /* check if any pending requests on the atomic list */
>>     if (!INK_ATOMICLIST_EMPTY(req->aio_temp_list))
>>       aio_move(req);
>>     /* now put the new request */
>>     aio_insert(op, req);
>>     ink_cond_signal(&req->aio_cond);
>>     ink_mutex_release(&req->aio_mutex);
>>   }
>>
>> When the servicing threads have no jobs, any requests atomically
>> enqueued ("D") by another thread after "A" but before "C" will _not_
>> get moved to the working queues and will _not_ signal the aio_cond.
>> If N-1 of the cache disk AIO threads are waiting for a condition
>> signal and the remaining service thread is in that "danger zone" when
>> the initial read of the volume header is enqueued, it will end up on
>> the temp list and never be serviced.
>>
>> In normal operation, the next request to acquire the mutex will move
>> the requests from the temp queue to the working queues.  This would
>> potentially cause a servicing delay, but not a soft lock as long as
>> there is a steady stream of requests.
>>
>> I can implement a dirty fix for my current problem (soft lock on cache
>> initialization every now and again).  However, in order to implement a
>> real fix I would need a better grasp on the requirements of the AIO
>> system.  For instance, are their typically far fewer request producer
>> threads than consumer threads (where is lock contention the most
>> troublesome)? Also, It seems that the working queues are not atomic as
>> they need to respect priority, however only the cluster code ever sets
>> the priority to something non default.
>>
>> If priorities can be bucketed and the model is 1/few producers and
>> many consumers then it seems like the better choice is to implement a
>> mutex that guards the enqueue to a set of atomic queues.  Dequeues can
>> run lockless until the queues are empty in which case they would have
>> to lock in order to guarantee that the queues are exhausted and the
>> signal is handled correctly.  Low producer counts reduce the lock
>> contention on enqueue and empty queues tend to be synonymous with low
>> performance demands, so the lock should not be a big deal in that way.
>>
>> -Bart
>>
>



-- 
Theo Schlossnagle

http://omniti.com/is/theo-schlossnagle

Re: AIO race condition

Reply via email to