HI Qihua, there are many reasons why the recordsize does not govern the I/O size directly. Metadata I/O is one, ZFS I/O scheduler aggregation is another. The application behavior might be a third.
Make sure to create the DB files after modifying the ZFS property. -r Le 26 déc. 08 à 11:49, qihua wu a écrit : > After I changed the recordsize to 8k, seems the read/write size is > not always 8k when using zpool iostat to check. So ZFS doesn't obey > the recordsize strictly? > > UC4-zuc4arch$> zfs get recordsize > NAME PROPERTY > VALUE SOURCE > phximddb03data/zuc4arch/data01 recordsize > 8K local > phximddb03data/zuc4arch/data02 recordsize > 8K local > > > > UC4-zuc4arch$> zpool iostat phximddb03data 1 > capacity operations bandwidth > pool used avail read write read write > -------------- ----- ----- ----- ----- ----- ----- > phximddb03data 487G 903G 13 62 1.26M 2.98M > phximddb03data 487G 903G 518 1 4.05M > 23.8K ===> here a write is of > size 24k > phximddb03data 487G 903G 456 37 3.58M 111K > phximddb03data 487G 903G 551 0 4.34M 11.9K > phximddb03data 487G 903G 496 8 3.86M 239K > phximddb03data 487G 903G 472 229 3.68M 982K > phximddb03data 487G 903G 499 3 3.91M 3.96K > phximddb03data 487G 903G 525 138 4.12M 631K > phximddb03data 487G 903G 497 0 3.89M 0 > phximddb03data 487G 903G 562 0 4.38M 0 > phximddb03data 487G 903G 337 3 2.63M 47.5K > phximddb03data 487G 903G 140 35 4.55M > 4.23M ===> here a write is of size 128k. > phximddb03data 487G 903G 484 272 7.12M 5.44M > phximddb03data 487G 903G 562 0 4.49M 127K > phximddb03data 487G 903G 514 4 4.03M 301K > phximddb03data 487G 903G 505 27 3.99M 1.00M > phximddb03data 487G 903G 518 14 4.10M 692K > phximddb03data 487G 903G 518 1 4.11M 14.4K > phximddb03data 487G 903G 504 2 3.98M 151K > phximddb03data 487G 903G 531 3 4.17M 392K > phximddb03data 487G 903G 375 2 2.95M 380K > phximddb03data 487G 903G 304 5 2.40M 296K > phximddb03data 487G 903G 438 3 3.45M 277K > phximddb03data 487G 903G 376 0 3.00M 0 > phximddb03data 487G 903G 239 15 2.84M 1.98M > phximddb03data 487G 903G 221 857 4.51M > 16.8M ==> here a read is of size 20k. > > > > On Thu, Dec 25, 2008 at 12:25 PM, Neil Perrin <neil.per...@sun.com> > wrote: > The default recordsize is 128K. So you are correct, for random reads > performance will be bad as excess data is read. For Oracle it is > recommended > to set the recordsize to 8k. This can be done when creating the > filesystem > using 'zfs create -o recordsize=8k <fs>'. If the fs has already been > created then you > can use 'zfs set recordsize=8k <fs>' *however* this only takes > effect for new files > so existing databases will retain the old block size. > > Hope this helps: > > Neil. > > > qihua wu wrote: > Hi, All, > > We have an oracle standby running on zfs and the database recovers > very very slow. The problem is the IO performance is very bad. I > find the recordsize of the ZFS is 128K, and the oracle block size is > 8K. My > > My question is: > When oracle tries to write a 8k block, will zfs read in 128K and > then write 128K. If that's the case, then zfs will do 16 (128k/ > 8k=16 )times IO as necessary. > > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > > 0.0 0.2 0.0 1.6 0.0 0.0 6.0 7.7 0 0 md4 > > 0.0 0.2 0.0 1.6 0.0 0.0 0.0 7.4 0 0 md14 > > 0.0 0.2 0.0 1.6 0.0 0.0 0.0 7.6 0 0 md24 > > 0.0 0.4 0.0 1.7 0.0 0.0 0.0 6.7 0 0 sd0 > > 0.0 0.4 0.0 1.7 0.0 0.0 0.0 6.5 0 0 sd2 > > 0.0 1.4 0.0 105.2 0.0 4.9 0.0 3503.3 0 100 ssd97 > > 0.0 3.0 0.0 384.0 0.0 10.0 0.0 3332.9 0 100 ssd99 > > 0.0 2.6 0.0 332.8 0.0 10.0 0.0 3845.7 0 100 ssd101 > > 0.0 4.4 0.0 563.3 0.0 10.0 0.0 2272.4 0 100 ssd103 > > 0.0 3.4 0.0 435.2 0.0 10.0 0.0 2940.8 0 100 ssd105 > > 0.0 3.6 0.0 460.8 0.0 10.0 0.0 2777.4 0 100 ssd107 > > 0.0 0.2 0.0 25.6 0.0 0.0 0.0 72.8 0 1 ssd112 > > > > > UC4-zuc4arch$> zfs list -o recordsize > RECSIZE > 128K > 128K > 128K > 128K > 128K > 128K > 128K > 128K > 128K > > Thanks, > Daniel, > ------------------------------------------------------------------------ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss