This is a race condition which should happen very very infrequently (e.g. once a day on a loaded system perhaps) and it would only be 10msec on an unloaded system, which would make it very very very infrequent (maybe once a year in that case). I agree that 10 msec is long these days, but unfortunately unix-ish systems cannot be counted on to not busy spin when delaying less than 10 msecs. So while I agree with you in principle, in practice 10 msec is probably a safer.
john On Sat, Sep 10, 2011 at 12:38 PM, Theo Schlossnagle <je...@omniti.com>wrote: > People don't deploy spinning disks much anymore. 10ms seems high. > <<1ms for SSDs. Perhaps we should optimize for that instead? > > On Sat, Sep 10, 2011 at 3:13 PM, John Plevyak <jplev...@acm.org> wrote: > > You are right. My preference would be to change this to a > > pthread_cond_timedwait > > with a 10 msec timeout (or somesuch). The rational being that (hard) > disk > > latency > > is in that range in any case and the chance of this happening is rare so > > taking > > a 10 msec hit would not be the end of the world. > > > > The other rational is that it is a minimally invasive change. > > > > What do you think Bart? > > > > john > > > > > > On Wed, Sep 7, 2011 at 7:34 AM, Bart Wyatt <wanderingb...@yooser.com> > wrote: > > > >> I think I have identified a race condition that can erroneously place > >> a new AIO request on the "temp" list without waking up a thread to > >> service it. It seems that in most cases of this race condition the > >> next request will rectify the issue, however in cases such as cache > >> volume initialization/recovery there are no additional requests issued > >> and the initialization soft locks itself. > >> > >> the problem stems from the handling of the temp list itself. The > >> servicing loop checks the temp list as such: > >> > >> ink_mutex_acquire(&my_aio_req->aio_mutex); > >> for (;;) { > >> do { > >> current_req = my_aio_req; > >> /* check if any pending requests on the atomic list */ > >> A>>> if (!INK_ATOMICLIST_EMPTY(my_aio_req->aio_temp_list)) > >> aio_move(my_aio_req); > >> if (!(op = my_aio_req->aio_todo.pop()) && !(op = > >> my_aio_req->http_aio_todo.pop())) > >> B>>> break; > >> <<blah blah blah, do the servicing>> > >> } while (1); > >> C>>>ink_cond_wait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex); > >> } > >> > >> The thread holds the aio_mutex and checks to see if the atomiclist is > >> empty, however in the request queuing code writing to the atomic list > >> happens outside of the mutex. The intent is probably to provide a > >> faster request enqueue when the lock contention is high: > >> > >> if (!ink_mutex_try_acquire(&req->aio_mutex)) { > >> D>>>ink_atomiclist_push(&req->aio_temp_list, op); > >> } else { > >> /* check if any pending requests on the atomic list */ > >> if (!INK_ATOMICLIST_EMPTY(req->aio_temp_list)) > >> aio_move(req); > >> /* now put the new request */ > >> aio_insert(op, req); > >> ink_cond_signal(&req->aio_cond); > >> ink_mutex_release(&req->aio_mutex); > >> } > >> > >> When the servicing threads have no jobs, any requests atomically > >> enqueued ("D") by another thread after "A" but before "C" will _not_ > >> get moved to the working queues and will _not_ signal the aio_cond. > >> If N-1 of the cache disk AIO threads are waiting for a condition > >> signal and the remaining service thread is in that "danger zone" when > >> the initial read of the volume header is enqueued, it will end up on > >> the temp list and never be serviced. > >> > >> In normal operation, the next request to acquire the mutex will move > >> the requests from the temp queue to the working queues. This would > >> potentially cause a servicing delay, but not a soft lock as long as > >> there is a steady stream of requests. > >> > >> I can implement a dirty fix for my current problem (soft lock on cache > >> initialization every now and again). However, in order to implement a > >> real fix I would need a better grasp on the requirements of the AIO > >> system. For instance, are their typically far fewer request producer > >> threads than consumer threads (where is lock contention the most > >> troublesome)? Also, It seems that the working queues are not atomic as > >> they need to respect priority, however only the cluster code ever sets > >> the priority to something non default. > >> > >> If priorities can be bucketed and the model is 1/few producers and > >> many consumers then it seems like the better choice is to implement a > >> mutex that guards the enqueue to a set of atomic queues. Dequeues can > >> run lockless until the queues are empty in which case they would have > >> to lock in order to guarantee that the queues are exhausted and the > >> signal is handled correctly. Low producer counts reduce the lock > >> contention on enqueue and empty queues tend to be synonymous with low > >> performance demands, so the lock should not be a big deal in that way. > >> > >> -Bart > >> > > > > > > -- > Theo Schlossnagle > > http://omniti.com/is/theo-schlossnagle >