Re: [zfs-discuss] dedupe question

George Wilson Sun, 08 Nov 2009 00:04:16 -0800

Dennis Clarke wrote:

On Sat, 2009-11-07 at 17:41 -0500, Dennis Clarke wrote:

Does the dedupe functionality happen at the file level or a lower block
level?

it occurs at the block allocation level.

I am writing a large number of files that have the fol structure :

------ file begins
1024 lines of random ASCII chars 64 chars long
some tilde chars .. about 1000 of then
some text ( english ) for 2K
more text ( english ) for 700 bytes or so
------------------

ZFS's default block size is 128K and is controlled by the "recordsize"
filesystem property.  Unless you changed "recordsize", each of the files
above would be a single block distinct from the others.

you may or may not get better dedup ratios with a smaller recordsize
depending on how the common parts of the file line up with block
boundaries.

the cost of additional indirect blocks might overwhelm the savings from
deduping a small common piece of the file.

                                                - Bill


Well, I as curious about these sort of things and figured that a simple
test would show me the behavior.

Now the first test I did was to write 26^2 files [a-z][a-z].dat in 26^2
directories named [a-z][a-z] where each file is 64K of random
non-compressible data and then some english text.

I guess I was wrong about the 64K random text chunk also .. because I
wrote out that data as chars from the set { [A-Z][a-z][0-9] } and thus ..
compressible ASCII data as opposed to random binary data.

So ... after doing that a few times I now see something fascinating :

$ ls -lo /tester/foo/*/aa/aa.dat
-rw-r--r--   1 dclarke    68330 Nov  7 22:38 /tester/foo/1/aa/aa.dat
-rw-r--r--   1 dclarke    68330 Nov  7 22:45 /tester/foo/2/aa/aa.dat
-rw-r--r--   1 dclarke    68330 Nov  7 22:43 /tester/foo/3/aa/aa.dat
-rw-r--r--   1 dclarke    68330 Nov  7 22:43 /tester/foo/4/aa/aa.dat
$ ls -lo /tester/foo/*/zz/az.dat
-rw-r--r--   1 dclarke    68330 Nov  7 22:39 /tester/foo/1/zz/az.dat
-rw-r--r--   1 dclarke    68330 Nov  7 22:47 /tester/foo/2/zz/az.dat
-rw-r--r--   1 dclarke    68330 Nov  7 22:45 /tester/foo/3/zz/az.dat
-rw-r--r--   1 dclarke    68330 Nov  7 22:47 /tester/foo/4/zz/az.dat

$ find /tester/foo -type f | wc -l
   70304

Those files, all 70,000+ of them, are unique and smaller than the
filesystem blocksize.

However :

$ zfs get
used,available,referenced,compressratio,recordsize,compression,dedup
zp_dd/tester
NAME          PROPERTY       VALUE             SOURCE
zp_dd/tester  used           4.51G             -
zp_dd/tester  available      3.49G             -
zp_dd/tester  referenced     4.51G             -
zp_dd/tester  compressratio  1.00x             -
zp_dd/tester  recordsize     128K              default
zp_dd/tester  compression    off               local
zp_dd/tester  dedup          on                local

Compression factors don't interest me at the moment .. but see this :

$ zpool get all zp_dd
NAME   PROPERTY       VALUE       SOURCE
zp_dd  size           67.5G       -
zp_dd  capacity       6%          -
zp_dd  altroot        -           default
zp_dd  health         ONLINE      -
zp_dd  guid           14649016030066358451  default
zp_dd  version        21          default
zp_dd  bootfs         -           default
zp_dd  delegation     on          default
zp_dd  autoreplace    off         default
zp_dd  cachefile      -           default
zp_dd  failmode       wait        default
zp_dd  listsnapshots  off         default
zp_dd  autoexpand     off         default
zp_dd  dedupratio     1.95x       -
zp_dd  free           63.3G       -
zp_dd  allocated      4.22G       -

The dedupe ratio has climbed to 1.95x with all those unique files that are
less than %recordsize% bytes.

You can get more dedup information by running 'zdb -DD zp_dd'. Thisshould show you how we break things down. Add more 'D' options and geteven more detail.


- George

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] dedupe question

Reply via email to