Right. So the situation is that the existing design dates from a time when threads and scheduling for Unix were primitive. It was not uncommon in those days for threads to "go away" for half a second, a second or more in a loaded system.
To deal with this, the current design is non-blocking all the way down. This means that no code (correctly written) ever blocks on a mutex (we use try_lock) which is the origin of that AIO code. Now, threading and scheduling has dramatically improved and the problems that motivated that design decision have (most probably :) gone away. It would be dramatically easier to do away with this requirement and write the code with small critical sections and just block if we miss a mutex. First, that is the way most programs are written these days and second, having to back out of the entire stack, saving context is a huge burden. It makes the code hard to read, write and debug. The upshot is that my suggestion to just add a 10 msec timeout is for the short term. I'd like to see a blocking, small critical section AIO/Cache/RamCache patch performance tested against the existing code *then* pull the trigger on such a new design and fix the offending code (including the AIO race). I plan to work on such a patch and would be very interested in talking to anyone who might be interested in such a thing as well. john On Mon, Sep 12, 2011 at 7:03 AM, Bart Wyatt <wanderingb...@yooser.com>wrote: > While I think this is not an ideal solution, but for the reasons you > mentioned (re invasiveness/ease), I'd be willing to accept it. > > I think I would prefer to fix the queuing so that it can't leave an > unserviced request. But if the "fix" of waking a thread every now and > again is well documented it should be ok for now. I would put a large > comment in the thread loop that indicates that we are away of the race > and chose to fix it by waking the thread on an interval. It would > surely save some future code-detective some time if they want to know > why their request sometimes waits for ~10ms on an otherwise idle > instance. It may also prevent some future "optimizer" from > re-introducing the bug by removing the interval on the conditional > wait. > > On Sun, Sep 11, 2011 at 4:35 PM, John Plevyak <jplev...@acm.org> wrote: > > I don't think so as this should be self correcting in a busy system > > as Bart pointed out; the main concern is initialization or recovery. > > > > I'll take care of this... > > > > john > > > > On Sun, Sep 11, 2011 at 1:59 PM, Leif Hedstrom <zw...@apache.org> wrote: > > > >> On 09/10/2011 02:17 PM, John Plevyak wrote: > >> > >>> This is a race condition which should happen very very infrequently > (e.g. > >>> once a day > >>> on a loaded system perhaps) and it would only be 10msec on an unloaded > >>> system, which would > >>> make it very very very infrequent (maybe once a year in that case). I > >>> agree > >>> that 10 msec > >>> is long these days, but unfortunately unix-ish systems cannot be > counted > >>> on > >>> to not > >>> busy spin when delaying less than 10 msecs. So while I agree with you > in > >>> principle, > >>> in practice 10 msec is probably a safer. > >>> > >> > >> Agreed. So, is there a bug filed for this? Is there an easy fix? John, > I'm > >> wondering if this is the same problem Yahoo Search was having, where > once in > >> a while, a cache would go MIA (unavailable) until they restarted the > server. > >> On their busy boxes, it'd happen after ~2 weeks or so. > >> > >> -- leif > >> > >> > > >