Right. So the situation is that the existing design dates from a time when
threads and scheduling for Unix were primitive. It was not uncommon in
those
days for threads to "go away" for half a second, a second or more in a
loaded system.
To deal with this, the current design is non-blocking all
While I think this is not an ideal solution, but for the reasons you
mentioned (re invasiveness/ease), I'd be willing to accept it.
I think I would prefer to fix the queuing so that it can't leave an
unserviced request. But if the "fix" of waking a thread every now and
again is well documented it
I don't think so as this should be self correcting in a busy system
as Bart pointed out; the main concern is initialization or recovery.
I'll take care of this...
john
On Sun, Sep 11, 2011 at 1:59 PM, Leif Hedstrom wrote:
> On 09/10/2011 02:17 PM, John Plevyak wrote:
>
>> This is a race condit
On 09/10/2011 02:17 PM, John Plevyak wrote:
This is a race condition which should happen very very infrequently (e.g.
once a day
on a loaded system perhaps) and it would only be 10msec on an unloaded
system, which would
make it very very very infrequent (maybe once a year in that case). I agree
This is a race condition which should happen very very infrequently (e.g.
once a day
on a loaded system perhaps) and it would only be 10msec on an unloaded
system, which would
make it very very very infrequent (maybe once a year in that case). I agree
that 10 msec
is long these days, but unfortuna
People don't deploy spinning disks much anymore. 10ms seems high.
<<1ms for SSDs. Perhaps we should optimize for that instead?
On Sat, Sep 10, 2011 at 3:13 PM, John Plevyak wrote:
> You are right. My preference would be to change this to a
> pthread_cond_timedwait
> with a 10 msec timeout (or s
You are right. My preference would be to change this to a
pthread_cond_timedwait
with a 10 msec timeout (or somesuch). The rational being that (hard) disk
latency
is in that range in any case and the chance of this happening is rare so
taking
a 10 msec hit would not be the end of the world.
The
I think I have identified a race condition that can erroneously place
a new AIO request on the "temp" list without waking up a thread to
service it. It seems that in most cases of this race condition the
next request will rectify the issue, however in cases such as cache
volume initialization/reco