HI Qihua, there are many reasons why the recordsize does not govern  
the I/O size directly. Metadata I/O is one, ZFS I/O scheduler  
aggregation is another.
The application behavior might be a third.

Make sure to create the DB files after modifying the ZFS property.

-r

Le 26 déc. 08 à 11:49, qihua wu a écrit :

> After I changed the recordsize to 8k, seems the read/write size is  
> not always 8k when using zpool iostat to check. So ZFS doesn't obey  
> the recordsize strictly?
>
> UC4-zuc4arch$> zfs get recordsize
> NAME                             PROPERTY     
> VALUE                            SOURCE
> phximddb03data/zuc4arch/data01   recordsize   
> 8K                               local
> phximddb03data/zuc4arch/data02   recordsize   
> 8K                               local
>
>
>
> UC4-zuc4arch$>  zpool iostat  phximddb03data 1
>                    capacity     operations    bandwidth
> pool             used  avail   read  write   read  write
> --------------  -----  -----  -----  -----  -----  -----
> phximddb03data   487G   903G     13     62  1.26M  2.98M
> phximddb03data   487G   903G    518      1  4.05M   
> 23.8K                                    ===>  here a write is of  
> size 24k
> phximddb03data   487G   903G    456     37  3.58M   111K
> phximddb03data   487G   903G    551      0  4.34M  11.9K
> phximddb03data   487G   903G    496      8  3.86M   239K
> phximddb03data   487G   903G    472    229  3.68M   982K
> phximddb03data   487G   903G    499      3  3.91M  3.96K
> phximddb03data   487G   903G    525    138  4.12M   631K
> phximddb03data   487G   903G    497      0  3.89M      0
> phximddb03data   487G   903G    562      0  4.38M      0
> phximddb03data   487G   903G    337      3  2.63M  47.5K
> phximddb03data   487G   903G    140     35  4.55M   
> 4.23M                               ===> here a write is of size 128k.
> phximddb03data   487G   903G    484    272  7.12M  5.44M
> phximddb03data   487G   903G    562      0  4.49M   127K
> phximddb03data   487G   903G    514      4  4.03M   301K
> phximddb03data   487G   903G    505     27  3.99M  1.00M
> phximddb03data   487G   903G    518     14  4.10M   692K
> phximddb03data   487G   903G    518      1  4.11M  14.4K
> phximddb03data   487G   903G    504      2  3.98M   151K
> phximddb03data   487G   903G    531      3  4.17M   392K
> phximddb03data   487G   903G    375      2  2.95M   380K
> phximddb03data   487G   903G    304      5  2.40M   296K
> phximddb03data   487G   903G    438      3  3.45M   277K
> phximddb03data   487G   903G    376      0  3.00M      0
> phximddb03data   487G   903G    239     15  2.84M  1.98M
> phximddb03data   487G   903G    221    857  4.51M   
> 16.8M                          ==> here a read is of size 20k.
>
>
>
> On Thu, Dec 25, 2008 at 12:25 PM, Neil Perrin <neil.per...@sun.com>  
> wrote:
> The default recordsize is 128K. So you are correct, for random reads
> performance will be bad as excess data is read.  For Oracle it is  
> recommended
> to set the recordsize to 8k. This can be done when creating the  
> filesystem
> using 'zfs create -o recordsize=8k <fs>'. If the fs has already been  
> created then you
> can use 'zfs set recordsize=8k <fs>'  *however* this only takes  
> effect for new files
> so existing databases will retain the old block size.
>
> Hope this helps:
>
> Neil.
>
>
> qihua wu wrote:
> Hi, All,
>
> We have an oracle standby running on zfs and the database recovers  
> very very slow. The problem is the IO performance is very bad. I  
> find the recordsize of the ZFS is 128K, and the oracle block size is  
> 8K. My
>
> My question is:
> When oracle tries to write a 8k block, will zfs read in 128K and  
> then write 128K.  If that's the case, then zfs will do 16 (128k/ 
> 8k=16 )times IO as necessary.
>
>                    extended device statistics
>    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>
>    0.0    0.2    0.0    1.6  0.0  0.0    6.0    7.7   0   0 md4
>
>    0.0    0.2    0.0    1.6  0.0  0.0    0.0    7.4   0   0 md14
>
>    0.0    0.2    0.0    1.6  0.0  0.0    0.0    7.6   0   0 md24
>
>    0.0    0.4    0.0    1.7  0.0  0.0    0.0    6.7   0   0 sd0
>
>    0.0    0.4    0.0    1.7  0.0  0.0    0.0    6.5   0   0 sd2
>
>    0.0    1.4    0.0  105.2  0.0  4.9    0.0 3503.3   0 100 ssd97
>
>    0.0    3.0    0.0  384.0  0.0 10.0    0.0 3332.9   0 100 ssd99
>
>    0.0    2.6    0.0  332.8  0.0 10.0    0.0 3845.7   0 100 ssd101
>
>    0.0    4.4    0.0  563.3  0.0 10.0    0.0 2272.4   0 100 ssd103
>
>    0.0    3.4    0.0  435.2  0.0 10.0    0.0 2940.8   0 100 ssd105
>
>    0.0    3.6    0.0  460.8  0.0 10.0    0.0 2777.4   0 100 ssd107
>
>    0.0    0.2    0.0   25.6  0.0  0.0    0.0   72.8   0   1 ssd112
>
>
>
>
> UC4-zuc4arch$> zfs list -o recordsize
> RECSIZE
>   128K
>   128K
>   128K
>   128K
>   128K
>   128K
>   128K
>   128K
>   128K
>
> Thanks,
> Daniel,
> ------------------------------------------------------------------------
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to