Doug Gilbert and I ran across
some weirdness in the way the block device queues are plugged/unplugged.
It turned up with some benchmarks of the SCSI generics driver - with the new
queueing code, the generics driver is inserting requests into the same queue
that block device requests are inserted.
The oddness is this. We
were observing stalls in the processing of commands that was traced to the fact
that the queue had remained plugged for an excessive amount of time. The
stalls last for about 5 seconds or so.
Some investigation revealed that
part of the answer is that the bdflush daemon essentially forces a bunch of
dirty pages to be written to disk, but never bothers to unplug the queue when it
is done. The result is that the queue remains plugged until someone else
comes along and unplugs it. As it turns out, kupdate() does unplug the
queue, and kupdate runs every 5 seconds or so.
Patching bdflush to run tq_disk
after flushing buffers (i.e. before the schedule()) fixed *most* of the problem,
but evidently not all of it (Doug was still observing stalls, but a lot less
frequently). In other words, there is someone else out there queueing
requests in such a way that the queue can remain plugged for some amount of
time.
My gut tells me that it is wrong
for bdflush to not unplug the queue when it is done queueing I/O requests.
My gut also tells me that the generics driver probably wants to be unplugging
the one specific queue that it is using to ensure that I/O gets queued right
away (it doesn't make sense to unplug all queues in this instance).
Comments?
-Eric
|
- Re: Weirdness in block device queues. Eric Youngdale
- Re: Weirdness in block device queues. Rik van Riel
- Re: Weirdness in block device queues. Eric Youngdale
- Re: Weirdness in block device queues. Jens Axboe
- Re: Weirdness in block device queues. Jens Axboe
- Re: Weirdness in block device queues. Giuliano Pochini
- Re: Weirdness in block device queue... Douglas Gilbert
- Re: Weirdness in block device queues. Andrea Arcangeli