Deduplication kind of handles sparse files since all blocks containing only 
zero will get mapped to the same storage.

As soon as one of those blocks sharing storage get written to it will be 
written to a new block, and the usage counter of the shared block gets reduced 
by one. Once usage reaches zero the block is flagged for reuse. At least that 
is how it seems to work in the netapp wafl file system. Wafl never rewrites a 
block in place, it always writes to a new location. I don't know about OneFS.

Sendt fra min Sony Xperia™-smarttelefon

---- Rick Stevens skrev ----

>On 04/01/2015 10:57 AM, Ranjan Maitra wrote:
>> Thanks!
>>
>>
>>> That's EMC's "OneFS" filesystem (EMC bought out Isilon).
>>>
>>>> On Wed, 1 Apr 2015 08:07:34 -0500 Ranjan Maitra 
>>>> <maitra.mbox.igno...@inbox.com> wrote:
>>>>
>>>>> Thanks to both Cameron and you, Bob!
>>>>>
>>>>> After the transfer, here is what we have, on that filesystem:
>>>>>
>>>>> $ du -sh kmeans --apparent-size
>>>>> 154G      kmeans
>>>>>
>>>>> $ du -sh kmeans
>>>>> 628G      kmeans
>>>>>
>>>>> So, I guess that leaves me (and others) stuck.
>>>
>>> Is "kmeans" on the target or the source filesystem?
>>
>> Sorry, this is on the target (Isilon FS). Locally (on a F21 workstation and 
>> ext4 FS) it clocks in at 154G and 159G respectively.
>>
>>   If it's the source,
>>> keep in mind that OneFS can do data dedupes (assuming it's enabled),
>>> but it is a NAS device (NFS and/or SMB). I don't believe it's capable
>>> of sparse files (few NAS are). The data dedupe would reduce the actual
>>> storage on disk on the EMC device , but not report it as a sparse
>>> filesystem
>>
>>
>> Yes, I have been given this explanation, as well as that th block size is 
>> turned up on the isilon. This means that the size of a single file is 
>> probably 16K, rather than the typical 4K desktop file size. However, I do 
>> not have files that are that small where it would make a difference. So, I 
>> don't know.
>>
>> I see: the dedupe is supposed to run over weekends but I am not sure what it 
>> does.
>
>Deduping is a process by which redundant data on a storage device is
>removed. You can loosely think of it as "gzip" at the block level on
>the storage device itself (although gzip is _compression_, not
>deduping). Everything on the device will _appear_ normal, but the
>redundancies will have been removed and less physical space used.
>
>Here's a good explanation:
>
>       http://www.webopedia.com/TERM/D/data_deduplication.html
>
>----------------------------------------------------------------------
>- Rick Stevens, Systems Engineer, AllDigital    ri...@alldigital.com -
>- AIM/Skype: therps2        ICQ: 22643734            Yahoo: origrps2 -
>-                                                                    -
>-       A squeegee, by any other name, wouldn't sound as funny.      -
>----------------------------------------------------------------------
>-- 
>users mailing list
>users@lists.fedoraproject.org
>To unsubscribe or change subscription options:
>https://admin.fedoraproject.org/mailman/listinfo/users
>Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
>Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
>Have a question? Ask away: http://ask.fedoraproject.org
-- 
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
Have a question? Ask away: http://ask.fedoraproject.org

Reply via email to