On 01/11/2012 05:12 PM, Ted Unangst wrote:
On Wed, Jan 11, 2012, Chris Cappuccio wrote:
If only one disk is affected at a time, 5.0 is the fastest, and has the
most trouble with responsiveness while being fast, this is likely to be
improved by a fair I/O scheduler. There is a generic framework in place
now for schedulers to get plugged in I don't think anybody has actually
written it yet.
There's also an issue with dirty buffers getting eaten up, but that is
prominent on slow devices, and you'd be WAITing in buf_needva in that case.
I don't think needva has been totally ruled out from what I've seen,
though it's less likely. My other guess is that the raid card itself
prioritizes writes over reads leading to a backlog of read requests.
I didn't follow the thread all the way back, so forgive me if this has
been covered. I'm betting that the disk subsystem & RAID controller
combination are choking on queued metadata writes. Some of the questions
are aimed at the user, and some at people who know the system code.
User: Is the file system mounted with soft updates?
Would the writes of the bit maps, inode and indirect blocks have piled up?
Does turning off soft updates help?
What is the block/cluster size? What is the stripe size and RAID
configuration?
RAIDs are really slow doing the required read-modify-write on small writes.
The cacheing algorithm(s) in the cluster may be interfering with the
metadata writes.
When reading the file the first time when no metadata is cached, does
the delay occur?
If the file is opened in update mode so that no new allocation is done,
does the delay occur? A trivial program might have to be written
(C, Python, Perl, LISP, COBOL, whatever).
Developers: Would the filesystem code write logically contiguous data
blocks out of order? If so, that could trigger read-modify-writes as well.
Has the soft update code changed to accumulate more metadata in core?
I don't know if there's any utility which can capture data about the
types of
data in the disk queues. That would rule this out.
Again, if this has been covered, just ignore me.
Geoff Steckel