Wanda, Thanks for your cogent analysis. Always appreciated.
We're trying to decide if we need to offer a Data Domain sort of thing to our customers. In the very specific case you describe, perhaps. I am 100% with you on the "why replicate backup" when you can more easily replicate data?! We're offering Compellent as our active data repository and they have a very nice replication bit that is very bandwidth friendly. I think money is better spent there than on replicating backup data. But try convincing a customer that's had the Kool-Aid that they don't want de-duplication! Your comment about management classes is right on! If you limit the number of version of a db backup that you keep to something reasonable, like seven, let's say and with a 1TB database (which is big!), then you have 7TB worst case of duplicate data! Let's see: that breaks down to about 7 LTO4 tapes. Or 10 750GB SATA drives. Or 7 x $100 = $700 for tape, plus slots of course so let's say $2000. For disk, depending on your vendor, that could cost between $3K and $8K (and if you're paying more than that for SATA drives you perhaps ought to seek counseling!). So how much would you be willing to spend to reduce this cost? No more than $8K. Does a DD cost less than that? I'm not thinking so. And unless my math is way off you can make a reasonable argument against for even more db data!' It's all about mind share, isn't it? Today, de-duplication is hot... Kelly J. Lipp VP Manufacturing & CTO STORServer, Inc. 485-B Elkton Drive Colorado Springs, CO 80907 719-266-8777 [EMAIL PROTECTED] -----Original Message----- From: ADSM: Dist Stor Manager [mailto:[EMAIL PROTECTED] On Behalf Of Wanda Prather Sent: Wednesday, August 29, 2007 3:12 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Data Deduplication Kelly, I have more than 1 customer considering a de-dup VTL product. It's true that for regular file systems, TSM doesn't redump unchanged files, so people aren't getting AS LARGE a reduction in data stored (of that type) as would a user of an old style full dump- incremental - incremental - full dump product. OTOH, even with TSM, your DB dumps (Exchange, SQL, most Oracle implementations) are still for the most part full dumps, followed by icrementals, then full dumps. The larger the data base, in most cases, the less the contents change. And you can't use subfile-backup on anything larger than 2 GB. I have several customers that have a relatively small number of clients (say 50 or less), but the bulk of their daily backup data is 1 or 2 very large data bases. And the bulk of the CONTENTS of those data bases doesn't change all that much. Send that DB full dump to a de-dup VTL that can identify duplicate "blobs" (I'm using that as a generic term because I don't mean "block" in the sense of a disk block or sector and different vendors can identify larger or smaller duplicate blobs), and you get a very large impact that TSM can't provide. The only thing that gets stored each day is the delta bits. Even if it's an Exchange/SQL/Oracle full-dump day, the amount of new data to be stored may be 10% or less of what it used to be. And I have more than 1 customer looking at a de-dup VTL as a way to make managing their own DR sites practical, because those VTL's can replicate to EACH OTHER across the WAN. The huge cost in transmitting your data to a DR site is the cost of the pipe. If, however, you can get the amount of data per day down to 10% of what it used to be by having the VTL compress and dedup, and you have another corporate location where you can put the other VTL, it starts looking close to cost-effective in $$ terms. In fact, IBM recovery services is offering Data Domain equipment on the floor in at least 1 of their recovery sites for that purpose. (The customer installs a a DD box on their site, leases the DD box in the IBM DR site, replicates between.) (Insert disclaimer here: I'm not necessarily a fan of replicating backup data, because the problem my customers always have is doing the DB recovery. I think the first choice should be replicating the real DB using something like MIMIX, so that it's always ready to go on the recovery end. I merely report the bit about replicating backup data because I have customers considering it.) Regarding the lost sales opportunities, I think you gotta go back and consider the features that TSM has that other people don't, dedup or not - there was a discussion on the list last month about comparing TSM to Legato & others, and there was remarkably little emphasis on management classes and the ability of TSM to treat different data differently according to business needs- I still haven't seen any other product that has what TSM provides. (Here not afraid to expose MY ignorance - would like to know if there is anything else out there -) Wanda > I'd like to steer this around a bit. Our sales folks are saying they > are losing TSM opportunities to de-dup vendors. What specific > business problem are customers trying to solve with de-dup? > > I'm thinking the following: > > 1. Reduce the amount of disk/tape required to storage backups. > Especially important for all an all disk backup solution. > 2. Reduce backup times (for source de-dup I would think. No benefit > in target de-dup for this). > 3. Replication of backup data across a wide area network. Obviously > if you have less stored you have less to replicate. > > Others? Relative importance of these? > > Does TSM in and of itself provide similar benefits in its natural state? > From this discussion adding de-dup at the backend does not necessarily > provide much though it does for the other traditional backup products. > Since we don't dup, we don't need to de-dup. > > Help me get it because aside from the typical "I gotta have it because > the trade rags tell me I gotta have it", I don't get it! > > Thanks, (Once again not afraid to expose my vast pool of ignorance...) > > > Kelly J. Lipp > VP Manufacturing & CTO > STORServer, Inc. > 485-B Elkton Drive > Colorado Springs, CO 80907 > 719-266-8777 > [EMAIL PROTECTED] > > -----Original Message----- > From: ADSM: Dist Stor Manager [mailto:[EMAIL PROTECTED] On Behalf > Of Curtis Preston > Sent: Wednesday, August 29, 2007 1:08 PM > To: ADSM-L@VM.MARIST.EDU > Subject: Re: [ADSM-L] Data Deduplication > >>As de-dup, from what I have read, compares across all files on a >>"system" (server, disk storage or whatever), it seems to me that this >>will be an enormous resource hog > > Exactly. To make sure everyone understands, the "system," is the > intelligent disk target, not a host you're backing up. A de-dupe > IDT/VTL is able to de-dupe anything against anything else that's been > sent to it. This can include, for example, a file in a filesystem and > the same file inside an Exchange Sent Items folder. > >>The de-dup technology only compares / looks at the files with in its >>specific repository. Example: We have 8 Protectier node in one data >>center which equtes to 8 Virtual Tape Libraries and 8 reposoitires. > The > > There are VTL/IDT vendors that offer a multi-head approach to > de-duplication. As you need more throughput, you buy more heads, and > all heads are part of one large appliance that uses a single global > de-dupe database. That way you don't have to point worry about which > backups go to which heads. Diligent's VTL Open is a multi-headed VTL, > but ProtecTier is not -- yet. I would ask them their plans for that. > > While this feature is not required for many shops, I think it's a very > important feature for large shops. >