For SSD we have code in illumos that disables disksort. Ultimately, we believe that the cost of disksort is in the noise for performance.
-- Garrett D'Amore On Jun 20, 2011, at 8:38 AM, "Andrew Gabriel" <andrew.gabr...@oracle.com> wrote: > Richard Elling wrote: >> On Jun 19, 2011, at 6:04 AM, Andrew Gabriel wrote: >> >>> Richard Elling wrote: >>> >>>> Actually, all of the data I've gathered recently shows that the number of >>>> IOPS does not significantly increase for HDDs running random workloads. >>>> However the response time does :-( My data is leading me to want to >>>> restrict the queue depth to 1 or 2 for HDDs. >>>> >>> Thinking out loud here, but if you can queue up enough random I/Os, the >>> embedded disk controller can probably do a good job reordering them into >>> less random elevator sweep pattern, and increase IOPs through reducing the >>> total seek time, which may be why IOPs does not drop as much as one might >>> imagine if you think of the heads doing random seeks (they aren't random >>> anymore). However, this requires that there's a reasonable queue of I/Os >>> for the controller to optimise, and processing that queue will necessarily >>> increase the average response time. If you run with a queue depth of 1 or >>> 2, the controller can't do this. >>> >> >> I agree. And disksort is in the mix, too. >> > > Oh, I'd never looked at that. > >>> This is something I played with ~30 years ago, when the OS disk driver was >>> responsible for the queuing and reordering disc transfers to reduce total >>> seek time, and disk controllers were dumb. >>> >> >> ...and disksort still survives... maybe we should kill it? >> > > It looks like it's possibly slightly worse than the pathologically worst > response time case I described below... > >>> There are lots of options and compromises, generally weighing reduction in >>> total seek time against longest response time. Best reduction in total seek >>> time comes from planning out your elevator sweep, and inserting newly >>> queued requests into the right position in the sweep ahead. That also gives >>> the potentially worse response time, as you may have one transfer queued >>> for the far end of the disk, whilst you keep getting new transfers queued >>> for the track just in front of you, and you might end up reading or writing >>> the whole disk before you get to do that transfer which is queued for the >>> far end. If you can get a big enough queue, you can modify the insertion >>> algorithm to never insert into the current sweep, so you are effectively >>> planning two sweeps ahead. Then the worse response time becomes the time to >>> process one queue full, rather than the time to read or write the whole >>> disk. Lots of other tricks too (e.g. insertion into sweeps taking into >>> account priority, such as i f > the I/O is a synchronous or asynchronous, and age of existing queue entries). > I had much fun playing with this at the time. >>> >> >> The other wrinkle for ZFS is that the priority scheduler can't re-order I/Os >> sent to the disk. >> > > Does that also go through disksort? Disksort doesn't seem to have any concept > of priorities (but I haven't looked in detail where it plugs in to the whole > framework). > >> So it might make better sense for ZFS to keep the disk queue depth small for >> HDDs. >> -- richard >> > > -- > Andrew Gabriel > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss