> -----Original Message----- > From: Erik Trimble [mailto:erik.trim...@oracle.com] > Sent: 星期二, 四月 26, 2011 12:47 > To: Ian Collins > Cc: Fred Liu; ZFS discuss > Subject: Re: [zfs-discuss] How does ZFS dedup space accounting work > with quota? > > On 4/25/2011 6:23 PM, Ian Collins wrote: > > On 04/26/11 01:13 PM, Fred Liu wrote: > >> Hmmmm, it seems dedup is pool-based not filesystem-based. > > That's correct. Although it can be turned off and on at the > filesystem > > level (assuming it is enabled for the pool). > Which is effectively the same as choosing per-filesystem dedup. Just > the inverse. You turn it on at the pool level, and off at the > filesystem > level, which is identical to "off at the pool level, on at the > filesystem level" that NetApp does.
My original though is just enabling dedup on one file system to check if it is mature enough or not in the production env. And I have only one pool. If dedup is filesytem-based, the effect of dedup will be just throttled within one file system and won't propagate to the whole pool. Just disabling dedup cannot get rid of all the effects(such as the possible performance degrade ... etc), because the already dedup'd data is still there and DDT is still there. The thinkable thorough way is totally removing all the dedup'd data. But is it the real thorough way? And also the dedup space saving is kind of indirect. We cannot directly get the space saving in the file system where the dedup is actually enabled for it is pool-based. Even in pool perspective, it is still sort of indirect and obscure from my opinion, the real space saving is the abs delta between the output of 'zpool list' and the sum of 'du' on all the folders in the pool (or 'df' on the mount point folder, not sure if the percentage like 123% will occur or not... grinning ^:^ ). But in NetApp, we can use 'df -s' to directly and easily get the space saving. > > >> If it can have fine-grained granularity(like based on fs), that will > be great! > >> It is pity! NetApp is sweet in this aspect. > >> > > So what happens to user B's quota if user B stores a ton of data that > is > > a duplicate of user A's data and then user A deletes the original? > Actually, right now, nothing happens to B's quota. He's always charged > the un-deduped amount for his quota usage, whether or not dedup is > enabled, and regardless of how much of his data is actually deduped. > Which is as it should be, as quotas are about limiting how much a user > is consuming, not how much the backend needs to store that data > consumption. > > e.g. > > A, B, C, & D all have 100Mb of data in the pool, with dedup on. > > 20MB of storage has a dedup-factor of 3:1 (common to A, B, & C) > 50MB of storage has a dedup factor of 2:1 (common to A & B ) > > Thus, the amount of unique data would be: > > A: 100 - 20 - 50 = 30MB > B: 100 - 20 - 50 = 30MB > C: 100 - 20 = 80MB > D: 100MB > > Summing it all up, you would have an actual storage consumption of 70 > (50+20 deduped) + 30+30+80+100 (unique data) = 310MB to actual storage, > for 400MB of apparent storage (i.e. dedup ratio of 1.29:1 ) > > A, B, C, & D would each still have a quota usage of 100MB. It is true, quota is in charge of logical data not physical data. Let's assume an interesting scenario -- say the pool is 100% full in logical data (such as 'df' tells you 100% used) but not full in physical data(such as 'zpool list' tells you still some space available), can we continue writing data into this pool? Anybody has interests to do this experiment? ;-) Thanks. Fred _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss