[exposed organs below…] On Oct 7, 2011, at 8:25 PM, Daniel Carosone wrote: > On Tue, Oct 04, 2011 at 09:28:36PM -0700, Richard Elling wrote: >> On Oct 4, 2011, at 4:14 PM, Daniel Carosone wrote: >> >>> I sent it twice, because something strange happened on the first send, >>> to the ashift=12 pool. "zfs list -o space" showed figures at least >>> twice those on the source, maybe roughly 2.5 times. >> >> Can you share the output? > > Source machine, zpool v14 snv_111b: > > NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD > VOLSIZE > int/iscsi_01 99.2G 237G 37.9G 199G 0 0 > 200G > > Destination machine, zpool v31 snv_151b: > > NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD > VOLSIZE > geek/iscsi_01 3.64T 550G 88.4G 461G 0 0 > 200G > uext/iscsi_01 1.73T 245G 39.2G 206G 0 0 > 200G > > geek is the ashift=12 pool, obviously. I'm assuming the smaller > difference for uext is due to other layout differences in the pool > versions. > >>> What is going on? Is there really that much metadata overhead? How >>> many metadata blocks are needed for each 8k vol block, and are they >>> each really only holding 512 bytes of metadata in a 4k allocation? >>> Can they not be packed appropriately for the ashift? >> >> Doesn't matter how small metadata compresses, the minimum size you can write >> is 4KB. > > This isn't about whether the metadata compresses, this is about > whether ZFS is smart enough to use all the space in a 4k block for > metadata, rather than assuming it can fit at best 512 bytes, > regardless of ashift. By packing, I meant packing them full rather > than leaving them mostly empty and wasted (or anything to do with > compression).
The answer is: it depends. Let's look for more clues first... > >> I think we'd need to see the exact layout of the internal data. This can be >> achieved with the zfs_blkstats macro in mdb. Perhaps we can take this offline >> and report back? > > Happy to - what other details / output would you like? This is easier to do offline, but while we're here… [assuming Solaris-derived OS with mdb] 0. scrub the pool, so that the block usage stats are loaded 1. find the address of the pool's spa structure, for example # echo ::spa | mdb -k ADDR STATE NAME ffffff01c647d580 ACTIVE stuff ffffff01c52b1040 ACTIVE syspool 2. look at the block usage stats, for example # echo ffffff01c52b1040::zfs_blkstats | mdb -k Dittoed blocks on same vdev: 4541 Blocks LSIZE PSIZE ASIZE avg comp %Total Type 1 16K 1K 3.00K 3.00K 16.00 0.00 object directory 3 1.50K 1.50K 4.50K 1.50K 1.00 0.00 object array 163 19.8M 1.46M 4.39M 27.6K 13.52 0.28 bpobj 336 1.79M 724K 2.12M 6.46K 2.53 0.13 SPA space map … 3. compare the block usage stats for the various pools Block counts are obvious LSIZE = logical size PSIZE = physical size, after compression ASIZE = allocated size, how much disk space is used (including raidz & copies) avg = average allocated size per block comp = compression ratio (LSIZE:PSIZE) %Total is the percent of total allocated space It should be obvious that ashift = 9 for the above example. -- richard -- ZFS and performance consulting http://www.RichardElling.com VMworld Copenhagen, October 17-20 OpenStorage Summit, San Jose, CA, October 24-27 LISA '11, Boston, MA, December 4-9
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss