Tom Hall <thattommyh...@gmail.com> writes:

> If you enable it after data is on the filesystem, it will find the
> dupes on read as well as write? Would a scrub therefore make sure the
> DDT is fully populated.

no.  only written data is added to the DDT, so you need to copy the data
somehow.  zfs send/recv is the most convenient, but you could even do a
loop of commands like

  cp -p "$file" "$file.tmp" && mv "$file.tmp" "$file"

> Re the DDT, can someone outline it's structure please? Some sort of
> hash table? The blogs I have read so far dont specify.

I can't help here.

> Re DDT size, is (data in use)/(av blocksize) * 256bit right as a worst
> case (ie all blocks non identical)

the size of an entry is much larger:

| From: Mertol Ozyoney <mertol.ozyo...@sun.com>
| Subject: Re: Dedup memory overhead
| Message-ID: <00cb01caa580$a3d6f110$eb84d330$%ozyo...@sun.com>
| Date: Thu, 04 Feb 2010 11:58:44 +0200
| 
| Approximately it's 150 bytes per individual block.

> What are average block sizes?

as a start, look at your own data.  divide the used size in "df" with
used inodes in "df -i".  example from my home directory:

  $ /usr/gnu/bin/df -i ~
  Filesystem            Inodes     IUsed     IFree  IUse%    Mounted on
  tank/home          223349423   3412777 219936646     2%    /volumes/home

  $ df -k ~
  Filesystem            kbytes      used     avail capacity  Mounted on
  tank/home          573898752 257644703 109968254    71%    /volumes/home

so the average file size is 75 KiB, smaller than the recordsize of 128
KiB.  extrapolating to a full filesystem, we'd get 4.9M files.
unfortunately, it's more complicated than that, since a file can consist
of many records even if the *average* is smaller than a single record.

a pessimistic estimate, then, is one record for each of those 4.9M
files, plus one record for each 128 KiB of diskspace (2.8M), for a total
of 7.7M records.  the size of the DDT for this (quite small!) filesystem
would be something like 1.2 GB.  perhaps a reasonable rule of thumb is 1
GB DDT per TB of storage.

(disclaimer: I'm not a kernel hacker, I just read this list :-)
-- 
Kjetil T. Homme
Redpill Linpro AS - Changing the game

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to