Re: [zfs-discuss] write cache partial-disk pools (was Server with 4 drives, how to configure ZFS?)

Richard Elling Sun, 19 Jun 2011 08:08:50 -0700

On Jun 19, 2011, at 6:04 AM, Andrew Gabriel wrote:
> Richard Elling wrote:
>> Actually, all of the data I've gathered recently shows that the number of 
>> IOPS does not significantly increase for HDDs running random workloads. 
>> However the response time does :-( My data is leading me to want to restrict 
>> the queue depth to 1 or 2 for HDDs.
>>  
> 
> Thinking out loud here, but if you can queue up enough random I/Os, the 
> embedded disk controller can probably do a good job reordering them into less 
> random elevator sweep pattern, and increase IOPs through reducing the total 
> seek time, which may be why IOPs does not drop as much as one might imagine 
> if you think of the heads doing random seeks (they aren't random anymore). 
> However, this requires that there's a reasonable queue of I/Os for the 
> controller to optimise, and processing that queue will necessarily increase 
> the average response time. If you run with a queue depth of 1 or 2, the 
> controller can't do this.


I agree. And disksort is in the mix, too.

> This is something I played with ~30 years ago, when the OS disk driver was 
> responsible for the queuing and reordering disc transfers to reduce total 
> seek time, and disk controllers were dumb.

...and disksort still survives... maybe we should kill it?

> There are lots of options and compromises, generally weighing reduction in 
> total seek time against longest response time. Best reduction in total seek 
> time comes from planning out your elevator sweep, and inserting newly queued 
> requests into the right position in the sweep ahead. That also gives the 
> potentially worse response time, as you may have one transfer queued for the 
> far end of the disk, whilst you keep getting new transfers queued for the 
> track just in front of you, and you might end up reading or writing the whole 
> disk before you get to do that transfer which is queued for the far end. If 
> you can get a big enough queue, you can modify the insertion algorithm to 
> never insert into the current sweep, so you are effectively planning two 
> sweeps ahead. Then the worse response time becomes the time to process one 
> queue full, rather than the time to read or write the whole disk. Lots of 
> other tricks too (e.g. insertion into sweeps taking into account priority, 
> such as if 
 the I/O is a synchronous or asynchronous, and age of existing queue entries). 
I had much fun playing with this at the time.

The other wrinkle for ZFS is that the priority scheduler can't re-order I/Os 
sent to the disk.
So it might make better sense for ZFS to keep the disk queue depth small for 
HDDs.
 -- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] write cache partial-disk pools (was Server with 4 drives, how to configure ZFS?)

Reply via email to