My understanding is that each core in a TSM server can identify around 60GB of undeduped data per hour. This puts a quad core at about 3-4TB per day of newly undeduped data per day (not running at night). Of course there are differences in CPU architecture and memory/storage as well but IBM say's they see abount 60GB/hour per core and advise not to go above 3-4TB per day on a FILEPOOL with dedupe on TSM.
On Fri, Sep 30, 2011 at 12:29 AM, Colwell, William F. <bcolw...@draper.com>wrote: > Hi Daniel, > > My main point was to say that your previous posts seemed to be saying that > dedup storagepools > were recommended to be 6 TB in size at most. It is my understanding the > 6TB recommendation was > a daily server thruput maximum design target when dedup is in use. > > I agree, a processor at 100% is not good and I have been adjusting the > server design to reduce > the load. > > I started re-hosting our backup service on v6 as soon as v6 was available. > I started out > deduping everything but quickly ran into performance problems. To solve > them I started excluding > classes of data from dedup - all Oracle backups, all outlook PST files and > any other file larger > than 1 GB. I also replaced all the disks I started with over 12 months and > greatly expanded the > total storage. > > Where the Redbook says that expiration is much improved, that is only > partly true. If dedup is involved, > a hidden process starts after the visible expiration process is done and > runs on for quite a while longer. > This process has to check if a chuck in an expired file can truly be > removed from storage because > it could be that other files are pointing to that chunk. You can see the > process by entering > 'show dedupdeleteinfo' after expiration completes. > > The thing about big files is that they are broken into lots of chunks. > When a big file is expired, > this hidden process will take a long time to complete and can bog down the > system. This is the > real reason I exclude some files from dedup. > > As for SATA, I have been using some big arrays (20 2TB disks, raid 6), 8 > such arrays, for 18 months > and have had only 1 disk fail. But I try not to abuse them. Backups first > go onto jbod > disks - 15K rpm, 600GB - and all the dedup activity is done there. The > storagepools on those disks > are then migrated to storagepools on the SATA arrays. It is a mostly > sequential process. > > I can only suggest that if your customer does storagepool backup from the > SATA arrays after migration or > reclaim, and the copypool is not dedup, then there would be a lot of random > requests to the SATA storagepools > to rehydrate the backups. > > Regards, > > Bill Colwell > Draper Lab > > -----Original Message----- > From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of > Daniel Sparrman > Sent: Thursday, September 29, 2011 1:24 AM > To: ADSM-L@VM.MARIST.EDU > Subject: Ang: Re: [ADSM-L] Ang: Re: [ADSM-L] Ang: Re: [ADSM-L] vtl versus > file systems for pirmary pool > > Like it says in the document, it's a recommendation and not a technical > limit. > > However, having the server running at 100% utilization all the time doesnt > seem like a healthy scenario. > > Why arent you deduplicating files larger than 1GB? From my experience, > datafiles from SQL, Exchange and such has a very large de-dup ratio, while > TSM's deduplication skips files smaller than 2KB? > > I have a customer up north who used this configuration on an HP EVA based > box with SATA disks. The disks where breaking down so fast that the arrays > within the box was in a constant "rebuild" phase. HP claimed it was TSM > dedup that was breaking the disks (they actually claimed TSM was writing so > often that the disks broke), a scenario I have very hard to believe. > > Best Regards > > Daniel > > > > Daniel Sparrman > Exist i Stockholm AB > Växel: 08-754 98 00 > Fax: 08-754 97 30 > daniel.sparr...@exist.se > http://www.existgruppen.se > Posthusgatan 1 761 30 NORRTÄLJE > > > > -----"ADSM: Dist Stor Manager" <ADSM-L@VM.MARIST.EDU> skrev: ----- > > > Till: ADSM-L@VM.MARIST.EDU > Från: "Colwell, William F." <bcolw...@draper.com> > Sänt av: "ADSM: Dist Stor Manager" <ADSM-L@VM.MARIST.EDU> > Datum: 09/28/2011 20:43 > Ärende: Re: [ADSM-L] Ang: Re: [ADSM-L] Ang: Re: [ADSM-L] vtl versus file > systems for pirmary pool > > Hi Daniel, > > > > I remember hearing about a 6 TB limit for dedup in a webinar or conference > call, > > but what I recall is that that was a daily thruput limit. In the same > section of the > > redbook as you quote is this paragraph - > > > > Experienced administrators already know that Tivoli Storage Manager > database expiration > > was one of the more processor-intensive activities on a Tivoli Storage > Manager Server. > > Expiration is still processor intensive, albeit less so in Tivoli Storage > Manager V6.1, but this is > > now second to deduplication in terms of consumption of processor cycles. > Calculating the > > MD5 hash for each object and the SHA1 hash for each chunk is a processor > intensive activity. > > > > I can say this is absolutely correct; my processor is frequently running at > or near 100%. > > > > I have gone way beyond 6 TB of storage for dedup storagepools as this sql > shows > > for the 2 instances on my server - > > > > select cast(stgpool_name as char(12)) as "Stgpool", - > > cast(sum(num_files) / 1024 /1024 as decimal(4,1)) as "Mil Files", > - > > cast(sum(physical_mb) / 1024 /1024 as decimal(4,1)) as > "Physical_TB", - > > cast(sum(logical_mb) / 1024 /1024 as decimal(4,1))as "Logical_TB", > - > > cast(sum(reporting_mb) / 1024 /1024 as decimal(4,1))as > "Reporting_TB" - > > from occupancy - > > where stgpool_name in (select stgpool_name from stgpools where deduplicate > = 'YES') - > > group by stgpool_name > > > > > > Stgpool Mil Files Physical_TB Logical_TB > Reporting_TB > > ------------- ---------- ------------ ----------- > ------------- > > BKP_2 368.0 0.0 30.0 > 95.8 > > BKP_2X 341.0 0.0 23.9 > 58.6 > > > > > > Stgpool Mil Files Physical_TB Logical_TB > Reporting_TB > > ------------- ---------- ------------ ----------- > ------------- > > BKP_2 224.0 0.0 35.7 > 74.1 > > BKP_FS_2 49.0 0.0 21.0 > 45.5 > > > > > > Also, I am not using any random disk pool, all the disk storage is scratch > allocated > > file class volumes. There is also a tape library (lto5) for files larger > than 1GB > > which are excluded from deduplication. > > > > > > Regards, > > > > Bill Colwell > > Draper Lab > > > > > > -----Original Message----- > From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of > Daniel Sparrman > Sent: Wednesday, September 28, 2011 3:49 AM > To: ADSM-L@VM.MARIST.EDU > Subject: Ang: Re: [ADSM-L] Ang: Re: [ADSM-L] Ang: Re: [ADSM-L] vtl versus > file systems for pirmary pool > > > > To be honest, it doesnt really say. The information is from the Tivoli > Storage Manager Technical Guide: > > > > Note: In terms of sizing Tivoli Storage Manager V6.1 deduplication, we > currently > > recommend using Tivoli Storage Manager to deduplicate up to 6 TB total of > storage pool > > space for the deduplicated pools. This is a rule of thumb only and exists > solely to give an > > indication of where to start investigating VTL or filer deduplication. The > reason that a > > particular figure is mentioned is for guidance in typical scenarios on > commodity hardware. > > If more than 6 TB of real diskspace is to be duplicated, you can either use > Tivoli Storage > > Manager or a hardware deduplication device. The 6 TB is in addition to > whatever disk is > > required by non-deduplicated storage pools. This rule of thumb will change > as processor > > and disk technologies advance, because the recommendation is not an > architectural, > > support, or testing limit. > > > > http://www.redbooks.ibm.com/redbooks/pdfs/sg247718.pdf > > > > I'm guessing it's server-side since client-side shouldnt use any resources > @ the server. I'm also guessing you could do 8TB or 10, but not 60TB. > > > > Best Regards > > > > Daniel Sparrman > > > > > > > > Daniel Sparrman > > Exist i Stockholm AB > > Växel: 08-754 98 00 > > Fax: 08-754 97 30 > > daniel.sparr...@exist.se > > http://www.existgruppen.se > > Posthusgatan 1 761 30 NORRTÄLJE > > > > > > > > -----"ADSM: Dist Stor Manager" <ADSM-L@VM.MARIST.EDU> skrev: ----- > > > > > > Till: ADSM-L@VM.MARIST.EDU > > Från: Hans Christian Riksheim <bull...@gmail.com> > > Sänt av: "ADSM: Dist Stor Manager" <ADSM-L@VM.MARIST.EDU> > > Datum: 09/28/2011 09:56 > > Ärende: Re: [ADSM-L] Ang: Re: [ADSM-L] Ang: Re: [ADSM-L] vtl versus file > systems for pirmary pool > > > > This 6 TB supported limit for deduplicated FILEPOOL does this limit > > apply when one does client side deduplication only? > > > > Just wondering since I have just set up a 30 TB FILEPOOL for this purpose. > > > > Regards > > > > Hans Chr. > > > > On Tue, Sep 27, 2011 at 8:44 PM, Daniel Sparrman > > <daniel.sparr...@exist.se> wrote: > > > Just to put an end to this discussion, we're kinda running out of limits > here: > > > > > > a) No VTL solution, neither DD, neither Sepaton, neither anyone, is a > replacement for random diskpools. Doesnt matter if you can configure 50 > drives, 500 drives or 5000 drives, the way TSM works, you're gonna make the > system go bad since the system is made from having random pools infront, > sequential pools in the back. A sequential device is not gonna replace > that, independent being a sequential file pool or a VTL (or, for that > question, a tape library). > > > > > > b) VTL's where invented because most backup software (I've only worked > with TSM, Legato & Veritas aka Symantec) is used to working with sequential > devices. That havent changed, and wont change in the near future. VTL's (and > the file device option) is just a replacement. Performance wise, VTL's are > gonna win all the time compared to a file device, question you need to ask > yourself is, do I need the VTL, or can I go along with using file devices. > According to the TSM manual (dont have the link , but if you want i'll find > it) the maximum supported file device pool for deduplication is 6TB... so if > you're thinking of replacing a VTL with a seq. file pool, keep that in mind. > The limit is because the amount of resources needed by TSM to do the file > deduplication is limited, or as the manual says, "until new technologies are > available". > > > > > > The discussion here where people are actually planning on just having a > sequential pool (since noone is actually discussing that there's a random > pool infront) is plain scary. No sequential device is gonna have their time > of the life having a fileserver serving 50K blocks at a time. > > > > > > So my last 50 cents worth is: > > > > > > a) Have a random pool infront > > > > > > b) Depending on the size of your environment, you're either gonna go with > a filepool and use de-dup (limit is 6TB for each pool, you might not want to > de-dup everything), or you're gonna go with a fullscale VTL. Choice here is > size vs costs. > > > > > > I've seen alot of posts here lately about the disadvantages with VTL's .. > well, I havent seen one this far with mine. I have a colleague who bought a > XXXX VTL and found out he needed another VTL just todo the de-dup, since one > VTL wasnt a supported configuration to do de-dup. I have another colleague > who bought a very cheap VTL solution (from a very mentioned name around > here) and ended up with having same hashes, but different data, leaving him > with unrestorable data. > > > > > > Comparing eggs to apples just isnt fair. Different manufactures of VTL's > do different things, meaning both performance and availability is completely > different. > > > > > > Just to sum up, we've had both 3584's and (back in the days) 3575, and > I've never been happier with our VTL (and yes, we do restore tests). > > > > > > Best Regards > > > > > > Daniel > > > > > > > > > > > > Daniel Sparrman > > > Exist i Stockholm AB > > > Växel: 08-754 98 00 > > > Fax: 08-754 97 30 > > > daniel.sparr...@exist.se > > > http://www.existgruppen.se > > > Posthusgatan 1 761 30 NORRTÄLJE > > > > > > > > > > > > -----"ADSM: Dist Stor Manager" <ADSM-L@VM.MARIST.EDU> skrev: ----- > > > > > > > > > Till: ADSM-L@VM.MARIST.EDU > > > Från: Rick Adamson <rickadam...@winn-dixie.com> > > > Sänt av: "ADSM: Dist Stor Manager" <ADSM-L@VM.MARIST.EDU> > > > Datum: 09/27/2011 18:02 > > > Ärende: Re: [ADSM-L] Ang: Re: [ADSM-L] vtl versus file systems for > pirmary pool > > > > > > Interesting. Every VTL based solution, including data domain, that I > looked at had limits on the amount of drives that could be emulated which > were nowhere near a hundred let alone a thousand. Perhaps it's time to > revisit this. > > > > > > The license is a data domain fee, and a hefty one at that. > > > > > > The bigger question I have is since the file based storage is native to > TSM why exactly is using a file based storage not supported? > > > > > > ~Rick > > > > > > > > > -----Original Message----- > > > From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of > Daniel Sparrman > > > Sent: Tuesday, September 27, 2011 10:30 AM > > > To: ADSM-L@VM.MARIST.EDU > > > Subject: [ADSM-L] Ang: Re: [ADSM-L] vtl versus file systems for pirmary > pool > > > > > > Not really sure where the general idea that a VTL will limit the number > of available mount points. > > > > > > I'm not familiar with Data Domain, but generally speaking, the number of > virtual tape drives configured within a VTL is usually thousands. Not sure > why you'd want that many though, I always prefer having a small diskpool > infront of whatever sequential pool I have, and let the bigger files pass > the diskpoool and go straightly to the seq. pool. > > > > > > As far as for LAN-free, the only available option I know of is SANergy. > And going down that road (concerning both price & complexity) will probably > make the VTL look cheap. > > > > > > Not sure what kind of licensing you're talking about concerning VTL, but > I assume it's a Data Domain license and not a TSM license? > > > > > > Best Regards > > > > > > Daniel Sparrman > > > > > > > > > > > > Daniel Sparrman > > > Exist i Stockholm AB > > > Växel: 08-754 98 00 > > > Fax: 08-754 97 30 > > > daniel.sparr...@exist.se > > > http://www.existgruppen.se > > > Posthusgatan 1 761 30 NORRTÄLJE > > > > > > > > > > > > -----"ADSM: Dist Stor Manager" <ADSM-L@VM.MARIST.EDU> skrev: ----- > > > > > > > > > Till: ADSM-L@VM.MARIST.EDU > > > Från: Rick Adamson <rickadam...@winn-dixie.com> > > > Sänt av: "ADSM: Dist Stor Manager" <ADSM-L@VM.MARIST.EDU> > > > Datum: 09/27/2011 16:52 > > > Ärende: Re: [ADSM-L] vtl versus file systems for pirmary pool > > > > > > A couple of things that I did not see mentioned here which I experienced > > > was.... for Data Domain the VTL is an additional license and it does > > > limit the available mount points (or emulated drives), where a TSM file > > > based pool does not. Like Wanda stated earlier depends what you can > > > afford ! > > > > > > I myself have grown fond of using the file based approach, easy to > > > manage, easy to configure, and never worry about an available tape drive > > > (virtual or otherwise). The lan-free issue is something to consider but > > > from what I have heard lately is that it can still be accomplished using > > > the file based storage. If anyone has any info on it I would appreciate > > > it. > > > > > > ~Rick > > > Jax, Fl. > > > > > > -----Original Message----- > > > From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of > > > Tim Brown > > > Sent: Monday, September 26, 2011 4:05 PM > > > To: ADSM-L@VM.MARIST.EDU > > > Subject: [ADSM-L] vtl versus file systems for pirmary pool > > > > > > What advantage does VTL emulation on a disk primary storage pool have > > > > > > as compared to disk storage pool that is non vtl ? > > > > > > > > > > > > It appears to me that a non vtl system would not require the daily > > > reclamation process > > > > > > and also allow for more client backups to occur simultaneously. > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Tim Brown > > > Systems Specialist - Project Leader > > > Central Hudson Gas & Electric > > > 284 South Ave > > > Poughkeepsie, NY 12601 > > > Email: tbr...@cenhud.com <<mailto:tbr...@cenhud.com>> > > > Phone: 845-486-5643 > > > Fax: 845-486-5921 > > > Cell: 845-235-4255 > > > > > > > > > > > > > > > This message contains confidential information and is only for the > > > intended recipient. If the reader of this message is not the intended > > > recipient, or an employee or agent responsible for delivering this > > > message to the intended recipient, please notify the sender immediately > > > by replying to this note and deleting all copies and attachments. >