Manoj Nayak wrote:
> Roch - PAE wrote:
>   
>> Manoj Nayak writes:
>>  > Hi All.
>>  > 
>>  > ZFS document says ZFS schedules it's I/O in such way that it manages to 
>>  > saturate a single disk bandwidth  using enough concurrent 128K I/O.
>>  > The no of concurrent I/O is decided by vq_max_pending.The default value 
>>  > for  vq_max_pending is 35.
>>  > 
>>  > We have created 4-disk raid-z group inside ZFS pool on Thumper.ZFS 
>>  > record size is set to 128k.When we read/write a 128K record ,it issue a
>>  > 128K/3 I/O to each of the 3 data disks in the 4-disk raid-z group.
>>  > 
>>  > We need to saturate all three data disk bandwidth in the Raidz group.Is 
>>  > it required to set vq_max_pending value to 35*3=135  ?
>>  > 
>>
>> Nope.
>>
>> Once a disk controller is working on 35 requests, we don't
>> expect to get any more out of it by queueing more requests
>> and we might even confuse the firmware and get less.
>>
>> Now for  an array controller and  a vdev  fronting for large
>> number of disks, then 35 might  be a low number not allowing
>> full throughput.  Rather    than tuning 35 up,    we suggest
>> splitting devives into smaller LUNs  since each luns is given
>> a 35-deep queue. 
>>
>>   
>>     
> It means 4-disk raid-z group inside ZFS pool is exported to ZFS as a 
> single device ( vdev ) .ZFS assigns vq_max_pending value of 35 to this vdev.
> To get higher throughput , I need to do following things ?
>   

This is not the terminology we use to describe ZFS.  Quite simply,
a storage pool contains devices configured in some way, hopefully
using  some form of data protection (mirror, raidz[12]) -- see zpool(1m).
Each storage pool can contain one or more file systems or volumes --
see zfs(1m).

The term "export" is used to describe transition of ownership of a
storage pool between different hosts.

> 1.Reduce no of disks in the raidz group from four to three disk.So that 
> same pending queue of 35 is available for lesser no of disk.
> 0r
>   

35 is for each physical disk.

> 2.Create slice out of physical disk & create raidz group out of four 
> slices of a physical disk.So that same pending queue of 35 is available 
> four slices of one physical disk.
>   

This will likely have a negative scaling affect.  Some devices, especially
raw disks, have wimpy microprocessors and limited memory.  You can
easily overload them and see the response time increase dramatically,
just as queuing theory will suggest.  Some research has shown that a
value of 8-16 is better, at least for some storage devices.   A value of 1
is perhaps too low, at least for devices which can handle multiple
outstanding I/Os.

 > My workload issues around 5000 MB read I/0 & iopattern says around 
55% of the IO are random in nature.
 > I don't know how much prefetching through track cache is going to 
help here.Probably I can try disabling > vdev_cache
 > through "set 'zfs_vdev_cache_max' 1"

We can't size something like this unless we also know the I/O
size.  If you are talking small iops, say 8 kBytes, then you'll
need lots of disks.  For larger iops, you may be able to get
by with fewer disks.
 -- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to