On Feb 8, 2010, at 6:04 PM, Kjetil Torgrim Homme wrote: > Tom Hall <thattommyh...@gmail.com> writes: > >> If you enable it after data is on the filesystem, it will find the >> dupes on read as well as write? Would a scrub therefore make sure the >> DDT is fully populated. > > no. only written data is added to the DDT, so you need to copy the data > somehow. zfs send/recv is the most convenient, but you could even do a > loop of commands like > > cp -p "$file" "$file.tmp" && mv "$file.tmp" "$file" > >> Re the DDT, can someone outline it's structure please? Some sort of >> hash table? The blogs I have read so far dont specify. > > I can't help here.
UTSL >> Re DDT size, is (data in use)/(av blocksize) * 256bit right as a worst >> case (ie all blocks non identical) > > the size of an entry is much larger: > > | From: Mertol Ozyoney <mertol.ozyo...@sun.com> > | Subject: Re: Dedup memory overhead > | Message-ID: <00cb01caa580$a3d6f110$eb84d330$%ozyo...@sun.com> > | Date: Thu, 04 Feb 2010 11:58:44 +0200 > | > | Approximately it's 150 bytes per individual block. > >> What are average block sizes? > > as a start, look at your own data. divide the used size in "df" with > used inodes in "df -i". example from my home directory: > > $ /usr/gnu/bin/df -i ~ > Filesystem Inodes IUsed IFree IUse% Mounted on > tank/home 223349423 3412777 219936646 2% /volumes/home > > $ df -k ~ > Filesystem kbytes used avail capacity Mounted on > tank/home 573898752 257644703 109968254 71% /volumes/home > > so the average file size is 75 KiB, smaller than the recordsize of 128 > KiB. extrapolating to a full filesystem, we'd get 4.9M files. > unfortunately, it's more complicated than that, since a file can consist > of many records even if the *average* is smaller than a single record. > > a pessimistic estimate, then, is one record for each of those 4.9M > files, plus one record for each 128 KiB of diskspace (2.8M), for a total > of 7.7M records. the size of the DDT for this (quite small!) filesystem > would be something like 1.2 GB. perhaps a reasonable rule of thumb is 1 > GB DDT per TB of storage. "zdb -D poolname" will provide details on the DDT size. FWIW, I have a pool with 52M DDT entries and the DDT is around 26GB. $ pfexec zdb -D tank DDT-sha256-zap-duplicate: 19725 entries, size 270 on disk, 153 in core DDT-sha256-zap-unique: 52284055 entries, size 284 on disk, 159 in core dedup = 1.00, compress = 1.00, copies = 1.00, dedup * compress / copies = 1.00 (you can tell by the stats that I'm not expecting much dedup :-) -- richard _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss