Re: [zfs-discuss] Dedup Questions.

Richard Elling Mon, 08 Feb 2010 22:12:10 -0800

On Feb 8, 2010, at 6:04 PM, Kjetil Torgrim Homme wrote:

> Tom Hall <thattommyh...@gmail.com> writes:
> 
>> If you enable it after data is on the filesystem, it will find the
>> dupes on read as well as write? Would a scrub therefore make sure the
>> DDT is fully populated.
> 
> no.  only written data is added to the DDT, so you need to copy the data
> somehow.  zfs send/recv is the most convenient, but you could even do a
> loop of commands like
> 
>  cp -p "$file" "$file.tmp" && mv "$file.tmp" "$file"
> 
>> Re the DDT, can someone outline it's structure please? Some sort of
>> hash table? The blogs I have read so far dont specify.
> 
> I can't help here.


UTSL

>> Re DDT size, is (data in use)/(av blocksize) * 256bit right as a worst
>> case (ie all blocks non identical)
> 
> the size of an entry is much larger:
> 
> | From: Mertol Ozyoney <mertol.ozyo...@sun.com>
> | Subject: Re: Dedup memory overhead
> | Message-ID: <00cb01caa580$a3d6f110$eb84d330$%ozyo...@sun.com>
> | Date: Thu, 04 Feb 2010 11:58:44 +0200
> | 
> | Approximately it's 150 bytes per individual block.
> 
>> What are average block sizes?
> 
> as a start, look at your own data.  divide the used size in "df" with
> used inodes in "df -i".  example from my home directory:
> 
>  $ /usr/gnu/bin/df -i ~
>  Filesystem            Inodes     IUsed     IFree  IUse%    Mounted on
>  tank/home          223349423   3412777 219936646     2%    /volumes/home
> 
>  $ df -k ~
>  Filesystem            kbytes      used     avail capacity  Mounted on
>  tank/home          573898752 257644703 109968254    71%    /volumes/home
> 
> so the average file size is 75 KiB, smaller than the recordsize of 128
> KiB.  extrapolating to a full filesystem, we'd get 4.9M files.
> unfortunately, it's more complicated than that, since a file can consist
> of many records even if the *average* is smaller than a single record.
> 
> a pessimistic estimate, then, is one record for each of those 4.9M
> files, plus one record for each 128 KiB of diskspace (2.8M), for a total
> of 7.7M records.  the size of the DDT for this (quite small!) filesystem
> would be something like 1.2 GB.  perhaps a reasonable rule of thumb is 1
> GB DDT per TB of storage.

"zdb -D poolname" will provide details on the DDT size.  FWIW, I have a
pool with 52M DDT entries and the DDT is around 26GB.

        $ pfexec zdb -D tank                                                    
                       
        DDT-sha256-zap-duplicate: 19725 entries, size 270 on disk, 153 in core
        DDT-sha256-zap-unique: 52284055 entries, size 284 on disk, 159 in core
        
        dedup = 1.00, compress = 1.00, copies = 1.00, dedup * compress / copies 
= 1.00

(you can tell by the stats that I'm not expecting much dedup :-)
 -- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Dedup Questions.

Reply via email to