This is a nice little piece on it ; Data deduplication vendors deploy different methods to detect unique information. Most vendors create a data-dependent fingerprint by applying a hashing algorithm on data blocks and comparing the result with previously calculated hashes. The hash results are usually stored on a disk as well. ProtecTIER uses a unique pattern matching and differencing algorithm (HyperFactor®) that identifies duplicate data. HyperFactor, IBM’s patented deduplication technology, first identifies “similar data” using a small key that fits into a deduplication appliance server’s memory-resident index and is not stored on disk. If an element looks similar, HyperFactor then performs a bit-level comparison between the new data and the similar data, storing only the bit-level differences. This unique method is more efficient because it dramatically reduces disk accesses for indexing, thus maintaining consistently high performance. In addition, ProtecTIER was designed to deliver 100% data integrity by avoiding the risks associated with hash collisions.
source : http://www.joshkrischer.com/files/JoshKrischerNativeReplication.pdf On Fri, Oct 14, 2011 at 4:09 AM, Prather, Wanda <wprat...@icfi.com> wrote: > Just asking, > I was told that a Protectier doesn't use SHA1 and can't have a hash > collision. > Can anybody verify that? > > > -----Original Message----- > From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of > Remco Post > Sent: Wednesday, October 05, 2011 3:15 PM > To: ADSM-L@VM.MARIST.EDU > Subject: Re: [ADSM-L] vtl versus file systems for pirmary pool > > Hi, > > I saw last week that about half of the people visiting the TSM Symposium > were running V6, it's been stable for me so far. > > The likeliness of an accidental SHA1 hash collision is relatively small > even compared to the total number of objects that a TSM server could > possibly ever store during its entire lifetime, insignificant. That being > said, if you think that your data is to valuable to even risk that, don't > dedup. > > > -- > > Gr., Remco > > Op 5 okt. 2011 om 19:24 heeft Shawn Drew < > shawn.d...@americas.bnpparibas.com> het volgende geschreven: > > > Along this line, we are still using TSM5.5 Some of the features > > mentioned previously require TSM6. TSM6 still feels risky to me. > > Maybe more risky than a hash collision. > > Just looking for a consensus, Do people think its mature enough now > > that it is as stable/reliable as TSM5 ? > > > > PS. Test restores are the only way to be sure your backups are good. > > You shouldn't just "trust" TSM. > > > > Regards, > > Shawn > > ________________________________________________ > > Shawn Drew > > > > > > > > > > > > Internet > > rrho...@firstenergycorp.com > > > > Sent by: ADSM-L@VM.MARIST.EDU > > 10/05/2011 11:03 AM > > Please respond to > > ADSM-L@VM.MARIST.EDU > > > > > > To > > ADSM-L > > cc > > > > Subject > > Re: [ADSM-L] Ang: Re: [ADSM-L] Ang: Re: [ADSM-L] Ang: Re: [ADSM-L] Ang: > > Re: [ADSM-L] Ang: Re: [ADSM-L] Ang: Re: [ADSM-L] Ang: Re: [ADSM-L] vtl > > versus file systems for pirmary pool > > > > > > > > > > > > > >> When TSM is duplicating your data (aka backing up storage pools), > >> there is no logical connection between your primary storage pool and > >> your copypool. > > > > Well . . .yes . .. no . . . > > > > All our eggs are in one basket no matter what. The logical connection > > between pri and copy pools is TSM itself. A logical corruption in TSM > > can take out both. Your data could be sitting there on tape and > > completely useless. Yes, that's why we have TSM db backups, but are > > they good? What if there is a TSM bug that renders all your backups > > bad - we don't find out until we need it! > > > > At some point you have to trust something. We all trust TSM. Yes, we > > do the db backup, create pri and copy pools, use reuse delay . . > > .everything to allow for problems . . . but we are still trusting that > > TSM workss as advertised. A really, really paranoid would run two > > complete separate/different backup systems - but who can afford that, or > want to? > > But then, we do do that for our biggest SAP/ORacle systems. We use > > Oracle/RMAN-to-flasharea/RMAN-to-TDPO/TSM, but we also run EMC/clone > > backups off our DR sites R2's . . but also to TSM. > > > > > > Rick > > > > > > > > > > > > ----------------------------------------- > > The information contained in this message is intended only for the > > personal and confidential use of the recipient(s) named above. If the > > reader of this message is not the intended recipient or an agent > > responsible for delivering it to the intended recipient, you are > > hereby notified that you have received this document in error and that > > any review, dissemination, distribution, or copying of this message is > > strictly prohibited. If you have received this communication in error, > > please notify us immediately, and delete the original message. > > > > > > > > This message and any attachments (the "message") is intended solely > > for the addressees and is confidential. If you receive this message in > > error, please delete it and immediately notify the sender. Any use not > > in accord with its purpose, any dissemination or disclosure, either > > whole or partial, is prohibited except formal approval. The internet > > can not guarantee the integrity of this message. BNP PARIBAS (and its > > subsidiaries) shall (will) not therefore be liable for the message if > > modified. Please note that certain functions and services for BNP Paribas > may be performed by BNP Paribas RCC, Inc. >