That assumes that the compression occurs file by file. Is that true or is on the transaction. I suppose it is on the files themselves and all clients would compress the file into the same set of bits. If it doesn't do that though, then your high dedup rates won't be realized.
Kelly Lipp Chief Technical Officer www.storserver.com 719-266-8777 x7105 STORServer solves your data backup challenges. Once and for all. -----Original Message----- From: ADSM: Dist Stor Manager [mailto:ads...@vm.marist.edu] On Behalf Of Grigori Solonovitch Sent: Saturday, November 07, 2009 9:16 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] de-duplicating compressed data >>>What is the effect of compression on de-duplication? Does it help to reach a >>>more de-duplication level? This is my opinion (please correct, if something is wrong): 1) note we are talking about client compression (compression=yes for node or in dsm.opt). Hardware compression on drive level is tottally independent from dedup process; 2) client compression can be used for any primary storage pool (device type DISK, FILE or any tapes). In this case, compressed data is comming to copy pools as well and you need less number of tapes in copy pools; 3) client compression takes time during backups (backups are much longer), but amount of data sent to TSM server via network is much less (average compression rate is 2-4 times); 4) deduplication is working only with primary sequential disk storage pool (device class FILE) and can give compression rate 10-20 and more. Deduplication process is working with data from all nodes (not only from one) and compares ALL to ALL. So just imagine which comression rate you can reach for some cases, when there are a lot of similar Windows servers (like server in each bank branch) with the same level of Windows and the same applications. For 50 branches you can have compression rate 40; 5) I see only one reason why deduplication is only working with FILE and is not working with DISK - after software deduplication you need to run reclamation to release space. Reclamation is not applicaple for DISK with random access. By the way, this question is still open and only IBM can anwer, what is the real reason; 6) there is special protection for data in TSM server. Deduplication is not working with data, if there is less than 2 copies on tapes. So sequence of actions is: backup data to DISK, make at least 2 copies of data to tapes (without deduplication!!), start deuplication and start reclamation. Deduplication will never reduce data on copy pools; 7) deduplication and compression are working together, but overal compression rate will be more than with only compression, but much less than with only deduplication. For example, you will have compression rate N for compression only (backups and all copies), M for deduplication only (only backups, copies have full size) and K for compression/deduplication (K for backups and N for copies). In general, N is much less than M, K is more than N and less than K. Real values for N, M and K depend on type of data; Regards, Grigori Please consider the environment before printing this Email. ________________________________ "This email message and any attachments transmitted with it may contain confidential and proprietary information, intended only for the named recipient(s). If you have received this message in error, or if you are not the named recipient(s), please delete this email after notifying the sender immediately. BKME cannot guarantee the integrity of this communication and accepts no liability for any damage caused by this email or its attachments due to viruses, any other defects, interception or unauthorized modification. The information, views, opinions and comments of this message are those of the individual and not necessarily endorsed by BKME."