Re: [Bacula-users] Fix documentation on deduplication

Gary R. Schmidt Wed, 24 Apr 2024 08:04:17 -0700

On 25/04/2024 00:47, Martin Simmons wrote:

On Wed, 24 Apr 2024 23:40:31 +1000, Gary R Schmidt said:


On 24/04/2024 22:33, Gary R. Schmidt wrote:

On 24/04/2024 21:30, Roberto Greiner wrote:


Em 24/04/2024 04:30, Radosław Korzeniewski escreveu:

Hello,

wt., 23 kwi 2024 o 13:33 Roberto Greiner <mrgrei...@gmail.com>
napisał(a):


     Em 23/04/2024 04:34, Radosław Korzeniewski escreveu:

     Hello,

     śr., 17 kwi 2024 o 14:01 Roberto Greiner <mrgrei...@gmail.com>
     napisał(a):


         The error is at the end of the page, where it says that you
         can see how
         much space is being used using 'df -h', but the problem is
         that df can't
         actually see the space gain from dedup, it shows how much
         would be used
         without dedup.


     This command (df -h) shows how much allocated and free space is
     available on the filesystem. So when you have a dedup ratio 20:1,
     and you wrote 20TB, then your df command shows 1TB allocated.


     But that is the exact problem I had. df did NOT show 1TB
     allocated. It indicated 20TB allocated (yes, in ZFS).

I have not used ZFS Dedup for a long time (I'm a ZFS user from the
first beta in Solaris), so I'm curious - if your zpool is 2TB in size
and you have a 20:1 dedup ratio with 20TB saved and 1TB allocated
then what df shows for you?
Something like this?
Size: 2TB
Used: 20TB
Avail: 1TB
Use%: 2000%

No, the values are quite different. I wrote 20tb to stay with the
example previously given. My actual numbers are:

df: 2,9TB used
zpool list: 862GB used, 3.4x dedup level.
Actual partition size: 7.2TB

You use zpool list to examine filespace.
Or zfs list.


On FreeBSD at least, zfs list will show the same as df (i.e. will include all
copies of the deduplicated data in the USED column).

I think the reason is that deduplication is done at the pool level, so there
is no single definition of which dataset owns each deduplicated block.  As a
result, the duplicates have to be counted multiple times.  This is different
from a cloned dataset, where the original dataset owns any blocks that are
shared.
That's correct, zfs list gives the logical filespace in use.  Sorry.

If you do "zfs get used,compressratio filesystem" then you can play withthe values returned...


$ for i in `zfs list -r zpool | sed 1d | awk '{print $1}'`
do
        zfs get used,compressratio $i | sed 1d
done
gives a list of very interesting numbers.  :-)

        Cheers,
                Gary    B-)


_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Re: [Bacula-users] Fix documentation on deduplication

Reply via email to