Re: [CentOS] Filesystem that doesn't store duplicate data

[EMAIL PROTECTED] Wed, 05 Dec 2007 17:45:30 -0800

----- Original Message ----- 
From: [EMAIL PROTECTED] 
To: "CentOS Mailing list" <centos@centos.org> 
Sent: Thursday, December 6, 2007 11:18:16 AM (GMT+1000) Australia/Brisbane 
Subject: [CentOS] Filesystem that doesn't store duplicate data


Is there such a filesystem available? It seems like it wouldn't be too hard to 
implement... Basically do things on a block by block basis. Store md5 of a 
block in the table, and when writing a new block, check if the md5 already 
exists and then point the new block to the old block. Since md5 is not 
guaranteed unique, might need to do a diff between the 2 blocks and if the 
blocks are indeed different, handle it somehow. 

When modifying an existing block that has multiple pointers, copy the block and 
modify the new block. 

I know I'm oversimplifying things a lot, but something like this could work, 
no? Would be a great filesystem to store backups on, or things like vmware 
volumes... 

Russ 
Sent from my Verizon Wireless BlackBerry 
_______________________________________________ 
CentOS mailing list 
CentOS@centos.org 
http://lists.centos.org/mailman/listinfo/centos 

-- 
This message has been scanned for viruses and 
dangerous content by MailScanner, and is 
believed to be clean. 


You are describing what I understand to be 'Data De-duplication". It is all the 
rage for backups as it has the potential to decrease backup times and volumes 
by significant amounts. I went to a presentation by Avamar (a partner of EMC ?) 
regarding this technology and it seemed really nice for your typical windows 
file server. I suppose it effectively turns your data into 'single-instance' 
which is no bad thing. I suppose it could be useful for large database backups 
as well. 

You'd think that using this technology on a live filesystem could incur a 
significant performance penalty due to all those calculations (fuse module 
anyone ?). Imagine a hardware optimized data de-duplication disk controller, 
similar to raid XOR optimized cpus. Now that would be cool. All it would need 
to store was meta-data when it had already seen the exact same block. I think 
fundamentally it is similar in result to on the fly disk compression. 

Let us know when you have a beta to test ! 

8^) 






-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Re: [CentOS] Filesystem that doesn't store duplicate data

Reply via email to