After I changed the recordsize to 8k, seems the read/write size is not always 8k when using zpool iostat to check. So ZFS doesn't obey the recordsize strictly?
UC4-zuc4arch$> zfs get recordsize NAME PROPERTY VALUE SOURCE phximddb03data/zuc4arch/data01 recordsize 8K local phximddb03data/zuc4arch/data02 recordsize 8K local UC4-zuc4arch$> zpool iostat phximddb03data 1 capacity operations bandwidth pool used avail read write read write -------------- ----- ----- ----- ----- ----- ----- phximddb03data 487G 903G 13 62 1.26M 2.98M phximddb03data 487G 903G 518 1 4.05M 23.8K ===> here a write is of size 24k phximddb03data 487G 903G 456 37 3.58M 111K phximddb03data 487G 903G 551 0 4.34M 11.9K phximddb03data 487G 903G 496 8 3.86M 239K phximddb03data 487G 903G 472 229 3.68M 982K phximddb03data 487G 903G 499 3 3.91M 3.96K phximddb03data 487G 903G 525 138 4.12M 631K phximddb03data 487G 903G 497 0 3.89M 0 phximddb03data 487G 903G 562 0 4.38M 0 phximddb03data 487G 903G 337 3 2.63M 47.5K phximddb03data 487G 903G 140 35 4.55M 4.23M ===> here a write is of size 128k. phximddb03data 487G 903G 484 272 7.12M 5.44M phximddb03data 487G 903G 562 0 4.49M 127K phximddb03data 487G 903G 514 4 4.03M 301K phximddb03data 487G 903G 505 27 3.99M 1.00M phximddb03data 487G 903G 518 14 4.10M 692K phximddb03data 487G 903G 518 1 4.11M 14.4K phximddb03data 487G 903G 504 2 3.98M 151K phximddb03data 487G 903G 531 3 4.17M 392K phximddb03data 487G 903G 375 2 2.95M 380K phximddb03data 487G 903G 304 5 2.40M 296K phximddb03data 487G 903G 438 3 3.45M 277K phximddb03data 487G 903G 376 0 3.00M 0 phximddb03data 487G 903G 239 15 2.84M 1.98M phximddb03data 487G 903G 221 857 4.51M 16.8M ==> here a read is of size 20k. On Thu, Dec 25, 2008 at 12:25 PM, Neil Perrin <neil.per...@sun.com> wrote: > The default recordsize is 128K. So you are correct, for random reads > performance will be bad as excess data is read. For Oracle it is > recommended > to set the recordsize to 8k. This can be done when creating the filesystem > using 'zfs create -o recordsize=8k <fs>'. If the fs has already been > created then you > can use 'zfs set recordsize=8k <fs>' *however* this only takes effect for > new files > so existing databases will retain the old block size. > > Hope this helps: > > Neil. > > > qihua wu wrote: > >> Hi, All, >> >> We have an oracle standby running on zfs and the database recovers very >> very slow. The problem is the IO performance is very bad. I find the >> recordsize of the ZFS is 128K, and the oracle block size is 8K. My >> >> My question is: >> When oracle tries to write a 8k block, will zfs read in 128K and then >> write 128K. If that's the case, then zfs will do 16 (128k/8k=16 )times IO >> as necessary. >> >> extended device statistics >> r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device >> >> 0.0 0.2 0.0 1.6 0.0 0.0 6.0 7.7 0 0 md4 >> >> 0.0 0.2 0.0 1.6 0.0 0.0 0.0 7.4 0 0 md14 >> >> 0.0 0.2 0.0 1.6 0.0 0.0 0.0 7.6 0 0 md24 >> >> 0.0 0.4 0.0 1.7 0.0 0.0 0.0 6.7 0 0 sd0 >> >> 0.0 0.4 0.0 1.7 0.0 0.0 0.0 6.5 0 0 sd2 >> >> 0.0 1.4 0.0 105.2 0.0 4.9 0.0 3503.3 0 100 ssd97 >> >> 0.0 3.0 0.0 384.0 0.0 10.0 0.0 3332.9 0 100 ssd99 >> >> 0.0 2.6 0.0 332.8 0.0 10.0 0.0 3845.7 0 100 ssd101 >> >> 0.0 4.4 0.0 563.3 0.0 10.0 0.0 2272.4 0 100 ssd103 >> >> 0.0 3.4 0.0 435.2 0.0 10.0 0.0 2940.8 0 100 ssd105 >> >> 0.0 3.6 0.0 460.8 0.0 10.0 0.0 2777.4 0 100 ssd107 >> >> 0.0 0.2 0.0 25.6 0.0 0.0 0.0 72.8 0 1 ssd112 >> >> >> >> >> UC4-zuc4arch$> zfs list -o recordsize >> RECSIZE >> 128K >> 128K >> 128K >> 128K >> 128K >> 128K >> 128K >> 128K >> 128K >> >> Thanks, >> Daniel, >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss@opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> >> > >
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss