On Jun 19, 2011, at 6:04 AM, Andrew Gabriel wrote: > Richard Elling wrote: >> Actually, all of the data I've gathered recently shows that the number of >> IOPS does not significantly increase for HDDs running random workloads. >> However the response time does :-( My data is leading me to want to restrict >> the queue depth to 1 or 2 for HDDs. >> > > Thinking out loud here, but if you can queue up enough random I/Os, the > embedded disk controller can probably do a good job reordering them into less > random elevator sweep pattern, and increase IOPs through reducing the total > seek time, which may be why IOPs does not drop as much as one might imagine > if you think of the heads doing random seeks (they aren't random anymore). > However, this requires that there's a reasonable queue of I/Os for the > controller to optimise, and processing that queue will necessarily increase > the average response time. If you run with a queue depth of 1 or 2, the > controller can't do this.
I agree. And disksort is in the mix, too. > This is something I played with ~30 years ago, when the OS disk driver was > responsible for the queuing and reordering disc transfers to reduce total > seek time, and disk controllers were dumb. ...and disksort still survives... maybe we should kill it? > There are lots of options and compromises, generally weighing reduction in > total seek time against longest response time. Best reduction in total seek > time comes from planning out your elevator sweep, and inserting newly queued > requests into the right position in the sweep ahead. That also gives the > potentially worse response time, as you may have one transfer queued for the > far end of the disk, whilst you keep getting new transfers queued for the > track just in front of you, and you might end up reading or writing the whole > disk before you get to do that transfer which is queued for the far end. If > you can get a big enough queue, you can modify the insertion algorithm to > never insert into the current sweep, so you are effectively planning two > sweeps ahead. Then the worse response time becomes the time to process one > queue full, rather than the time to read or write the whole disk. Lots of > other tricks too (e.g. insertion into sweeps taking into account priority, > such as if the I/O is a synchronous or asynchronous, and age of existing queue entries). I had much fun playing with this at the time. The other wrinkle for ZFS is that the priority scheduler can't re-order I/Os sent to the disk. So it might make better sense for ZFS to keep the disk queue depth small for HDDs. -- richard _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss