[exposed organs below…]

On Oct 7, 2011, at 8:25 PM, Daniel Carosone wrote:
> On Tue, Oct 04, 2011 at 09:28:36PM -0700, Richard Elling wrote:
>> On Oct 4, 2011, at 4:14 PM, Daniel Carosone wrote:
>> 
>>> I sent it twice, because something strange happened on the first send,
>>> to the ashift=12 pool.  "zfs list -o space" showed figures at least
>>> twice those on the source, maybe roughly 2.5 times.
>> 
>> Can you share the output?
> 
> Source machine, zpool v14 snv_111b:
> 
> NAME          AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD  
> VOLSIZE
> int/iscsi_01  99.2G   237G     37.9G    199G              0          0     
> 200G
> 
> Destination machine, zpool v31 snv_151b:
> 
> NAME           AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD  
> VOLSIZE
> geek/iscsi_01  3.64T   550G     88.4G    461G              0          0     
> 200G
> uext/iscsi_01  1.73T   245G     39.2G    206G              0          0     
> 200G
> 
> geek is the ashift=12 pool, obviously.  I'm assuming the smaller
> difference for uext is due to other layout differences in the pool
> versions.
> 
>>> What is going on? Is there really that much metadata overhead?  How
>>> many metadata blocks are needed for each 8k vol block, and are they
>>> each really only holding 512 bytes of metadata in a 4k allocation?
>>> Can they not be packed appropriately for the ashift?
>> 
>> Doesn't matter how small metadata compresses, the minimum size you can write
>> is 4KB.
> 
> This isn't about whether the metadata compresses, this is about
> whether ZFS is smart enough to use all the space in a 4k block for
> metadata, rather than assuming it can fit at best 512 bytes,
> regardless of ashift.  By packing, I meant packing them full rather
> than leaving them mostly empty and wasted (or anything to do with
> compression). 

The answer is: it depends. Let's look for more clues first...

> 
>> I think we'd need to see the exact layout of the internal data. This can be 
>> achieved with the zfs_blkstats macro in mdb. Perhaps we can take this offline
>> and report back?
> 
> Happy to - what other details / output would you like?

This is easier to do offline, but while we're here…
[assuming Solaris-derived OS with mdb]

0. scrub the pool, so that the block usage stats are loaded

1. find the address of the pool's spa structure, for example
        # echo ::spa | mdb -k
        ADDR                 STATE NAME                                         
       
        ffffff01c647d580    ACTIVE stuff
        ffffff01c52b1040    ACTIVE syspool

2. look at the block usage stats, for example
        # echo ffffff01c52b1040::zfs_blkstats | mdb -k
        Dittoed blocks on same vdev: 4541
        
        Blocks  LSIZE   PSIZE   ASIZE     avg    comp   %Total  Type
             1    16K      1K   3.00K   3.00K   16.00     0.00  object directory
             3  1.50K   1.50K   4.50K   1.50K    1.00     0.00  object array
           163  19.8M   1.46M   4.39M   27.6K   13.52     0.28  bpobj
           336  1.79M    724K   2.12M   6.46K    2.53     0.13  SPA space map
        …


3. compare the block usage stats for the various pools
        Block counts are obvious
        LSIZE = logical size
        PSIZE = physical size, after compression
        ASIZE = allocated size, how much disk space is used (including raidz & 
copies)
        avg = average allocated size per block
        comp = compression ratio (LSIZE:PSIZE)
        %Total is the percent of total allocated space

It should be obvious that ashift = 9 for the above example.
 -- richard

-- 

ZFS and performance consulting
http://www.RichardElling.com
VMworld Copenhagen, October 17-20
OpenStorage Summit, San Jose, CA, October 24-27
LISA '11, Boston, MA, December 4-9 













_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to