Re: [zfs-discuss] write cache partial-disk pools (was Server with 4 drives, how to configure ZFS?)

Garrett D'Amore Mon, 20 Jun 2011 10:30:57 -0700

For SSD we have code in illumos that disables disksort.  Ultimately, we believe 
that the cost of disksort is in the noise for performance.


  -- Garrett D'Amore

On Jun 20, 2011, at 8:38 AM, "Andrew Gabriel" <andrew.gabr...@oracle.com> wrote:

> Richard Elling wrote:
>> On Jun 19, 2011, at 6:04 AM, Andrew Gabriel wrote:
>>  
>>> Richard Elling wrote:
>>>    
>>>> Actually, all of the data I've gathered recently shows that the number of 
>>>> IOPS does not significantly increase for HDDs running random workloads. 
>>>> However the response time does :-( My data is leading me to want to 
>>>> restrict the queue depth to 1 or 2 for HDDs.
>>>>       
>>> Thinking out loud here, but if you can queue up enough random I/Os, the 
>>> embedded disk controller can probably do a good job reordering them into 
>>> less random elevator sweep pattern, and increase IOPs through reducing the 
>>> total seek time, which may be why IOPs does not drop as much as one might 
>>> imagine if you think of the heads doing random seeks (they aren't random 
>>> anymore). However, this requires that there's a reasonable queue of I/Os 
>>> for the controller to optimise, and processing that queue will necessarily 
>>> increase the average response time. If you run with a queue depth of 1 or 
>>> 2, the controller can't do this.
>>>    
>> 
>> I agree. And disksort is in the mix, too.
>>  
> 
> Oh, I'd never looked at that.
> 
>>> This is something I played with ~30 years ago, when the OS disk driver was 
>>> responsible for the queuing and reordering disc transfers to reduce total 
>>> seek time, and disk controllers were dumb.
>>>    
>> 
>> ...and disksort still survives... maybe we should kill it?
>>  
> 
> It looks like it's possibly slightly worse than the pathologically worst 
> response time case I described below...
> 
>>> There are lots of options and compromises, generally weighing reduction in 
>>> total seek time against longest response time. Best reduction in total seek 
>>> time comes from planning out your elevator sweep, and inserting newly 
>>> queued requests into the right position in the sweep ahead. That also gives 
>>> the potentially worse response time, as you may have one transfer queued 
>>> for the far end of the disk, whilst you keep getting new transfers queued 
>>> for the track just in front of you, and you might end up reading or writing 
>>> the whole disk before you get to do that transfer which is queued for the 
>>> far end. If you can get a big enough queue, you can modify the insertion 
>>> algorithm to never insert into the current sweep, so you are effectively 
>>> planning two sweeps ahead. Then the worse response time becomes the time to 
>>> process one queue full, rather than the time to read or write the whole 
>>> disk. Lots of other tricks too (e.g. insertion into sweeps taking into 
>>> account priority, such as i
 f
> the I/O is a synchronous or asynchronous, and age of existing queue entries). 
> I had much fun playing with this at the time.
>>>    
>> 
>> The other wrinkle for ZFS is that the priority scheduler can't re-order I/Os 
>> sent to the disk.
>>  
> 
> Does that also go through disksort? Disksort doesn't seem to have any concept 
> of priorities (but I haven't looked in detail where it plugs in to the whole 
> framework).
> 
>> So it might make better sense for ZFS to keep the disk queue depth small for 
>> HDDs.
>> -- richard
>>  
> 
> -- 
> Andrew Gabriel
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] write cache partial-disk pools (was Server with 4 drives, how to configure ZFS?)

Reply via email to