Greetings, Hopefully, this will not be too much traffic about the same topic. There are a zillion people jumping in to the dedupe market, because of the huge opportunity to sell products in this space. Not all products are created equal. Ask questions (or get references from existing customers) to find out what ongoing support has been like, and what problems or maintenance issues have arisen, and how the vendor handled them.
In a normal tape environment, or virtual tape library, each backup you do creates a separate copy of your data, at least the changed parts. For data that is changing often, you may have a dozen versions of that data on different media. And presumably, you are also creating a daily offsite copy of that data. In other words, you have redundant, multiple copies of the data on separate media. This is necessary, because no media is perfect. I a dedup appliance, that is exactly what you don't have. The dedup process guarantees that only one copy is kept of each unique block of that data. If a given block of data is lost due to corruption or failure of the media, then potentially all of the copies of a certain file that contains that block of data will be lost. The people who are designing these products, therefore, build their products to mitigate this potential loss by: - Striping data across multiple disks, multiple RAID sets, and sometimes (as in the case of Avamar) even across multiple nodes in the grid. - Building integrity checking into various layers of their protocol, so that incoming data is proven clean as it is received. - Systematic integrity checking of data as it resides on disk. The better designs do a full system scan and check of all data every 24 hours or so. - Replication software that does integrity checking during the replication, so any corruption won't get transferred to the remote copy. These are the kinds of features that didn't exist during early versions of dedupe products. Any corruption due to a failure of the disk or firmware in the array could be catastrophic. But many dedupe products today have a healthy paranoia about the reliability of hardware, and protect themselves accordingly. So when evaluating dedupe products, be sure to ask questions about these sorts of features. Often times, the low end products don't. Best Regards, John D. Schneider The Computer Coaching Community, LLC Office: (314) 635-5424 Toll Free: (866) 796-9226 Cell: (314) 750-8721 -------- Original Message -------- Subject: Re: [ADSM-L] Dedupe From: "Strand, Neil B." <nbstr...@lmus.leggmason.com> Date: Thu, June 25, 2009 8:09 am To: ADSM-L@VM.MARIST.EDU Ditto on Lindsay's "it depends" For my NetApp devices, observed NAS filesystem dedupe renges from 10% to 70% depending on the data. VMware NFS shares typically show a good ratio. We for our VM environment, we split our OS apart from data and paging space as depicted below: Filesystem used saved %saved /vol/PROD_VM_OS/ 98314436 227793716 70% /vol/PROD_VM_PAGING/ 3107084 1090756 26% /vol/PROD_VM_DATA1/ 11253900 17343096 61% /vol/DR_VM_OS1/ 105852808 236518940 69% /vol/DR_VM_DATA1/ 431134632 216285060 33% /vol/DR_VM_PAGING1/ 35520 4272 11% The paging space is very dynamic and I don't expect much savings. The OS space (where VM operating systems are installed) is relatively static and redundant and reflects that with high dedup ratios. The data space (where applications and everything else is) has a wide variance - as expected. But the end result is that I am saving disk space and actually improving overall performance because redundant data has a higher probability of residing in cache and the reference to a particular bit of redundant data has a higher probability of residing in the cached lookup table. If you are looking for dedupe on tape media, I don't think it is feasable nor desired. Simple compression now allows me to put nearly 3TB on a single 3592 tape (again depending on the data). At a nominal cost of $150/tape this results in about 5 cents/GB. Not too shabby. I make a second offsite copy of the same data resulting in an overall cost of 10 cents to provide +"five nines" probability that my company's data is recoverable for the next 6 years. This is less than the cost of electricity for disk based storage for the same time period. Dedupe has it's place as do most technologies. It is not a golden egg unless you force it to be ... and then, when it hatches, it may be a fine goose or it may be a platypus - it depends on your environment. Cheers, Neil Strand Storage Engineer - Legg Mason Baltimore, MD. (410) 580-7491 Whatever you can do or believe you can, begin it. Boldness has genius, power and magic. -----Original Message----- From: ADSM: Dist Stor Manager [mailto:ads...@vm.marist.edu] On Behalf Of Ochs, Duane Sent: Thursday, June 25, 2009 7:35 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Dedupe For common practice de-dup is not a tape oriented process. It is usually to reduce data on disks. One concern would be the amount of tape mounts required to restore data in the event of a DR scenario. As the article has stated there are not many "global" de-dup products yet. We have been able to implement some dedup on specific applications, for instance E-mail attachments and it has worked out fairly well. However, it primarily was to reduce the size of the Storage Groups of our Exchange cluster, in the event of a DR scenario, which is on tier 1 storage. And the de-dupped attachments are now on tier 2. It reduced our SGs by 1/3. The exchange SGs backups are retained based on legal requirements and replicated. The attachments are not. I also tested Data Domain and was very unimpressed by the numbers I saw. It had very little impact on our largest amounts of data. Imaging, Exchange and DB dumps. But that is also the hardest type of data to de-dup. My two cents. -----Original Message----- From: ADSM: Dist Stor Manager [mailto:ads...@vm.marist.edu] On Behalf Of madunix Sent: Wednesday, June 24, 2009 11:37 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: Dedupe However, for my thoughts of Dedupe it could be interesting for those who need to decrease the number of tape cartridges, but they could suffer signifigannt CPU and I/O spec. for dedupe processing, and one issue i was thinking about is a fauiler or if one part is corrupted, i.e. many files would be affected by loss of common chunk, and what about encryption is it compatible with encryption. Thanks madunix >> -----Original Message----- >> From: ADSM: Dist Stor Manager [mailto:ads...@vm.marist.edu] On Behalf >> Of lindsay morris >> Sent: Wednesday, June 24, 2009 1:07 PM >> To: ADSM-L@VM.MARIST.EDU >> Subject: Re: [ADSM-L] Dedupe >> >> Short and clear answer about de-dupe: >> >> It depends. >> >> Hope this helps. >> >> ------ >> Mr. Lindsay Morris >> Principal >> www.tsmworks.com >> 919-403-8260 >> lind...@tsmworks.com >> IMPORTANT: E-mail sent through the Internet is not secure. Legg Mason therefore recommends that you do not send any confidential or sensitive information to us via electronic mail, including social security numbers, account numbers, or personal identification numbers. Delivery, and or timely delivery of Internet mail is not guaranteed. Legg Mason therefore recommends that you do not send time sensitive or action-oriented messages to us via electronic mail. This message is intended for the addressee only and may contain privileged or confidential information. Unless you are the intended recipient, you may not use, copy or disclose to anyone any information contained in this message. If you have received this message in error, please notify the author by replying to this message and then kindly delete the message. Thank you.