There have been numerous questions and posts regarding the 10x improvements with TSM data deduplication seen when upgrading to TSM V7.1. IBM would like to explain this statement further.
The up to 10x improvement was based on a comparison of prior published maximum daily dedup capability for a single server (e.g. 3TB daily) versus the new published maximum daily dedup capability (e.g. 30TB). In this context, "dedup capability" includes ingest of client data on the server and deduplication of that data. In addition, it includes the post-processing of creating a second copy of the data, reclamation, and expiration/deref. The baseline was V6.2 and was published in the original Effective Planning and Use of IBM TSM Dedup Whitepaper in August 2012. The baseline encompassed all daily ingest and deduplication processing plus creation of a second copy using storage pool backup. The new compare which achieved the up to 10x improvement over the original 3TB occurred against V6.3.4.200 and was calculated using client side dedup with optimized hardware (e.g. SSD) and improved server code in several paths (reclamation and expiration among others). The maximum daily server side deduplication capability was improved to 20TB through a combination of the optimized code paths and different overlapping of some server processes during the 24 hour workload. In both the 20TB and 30TB cases, the storage pool backup was replaced with node replication to achieve the second copy. In IBM labs, we saw improvements on pre-existing test configurations with code changes alone, although not 10x. The combination of improved hardware, code updates, and the transition to node replication combined with client side deduplication, created the ability to achieve 30TB. The 10x improvement comes from the compare of the 3TB original published daily ingest recommendation versus the updated 30TB published daily ingest recommendation. The Effective Planning and Use of IBM TSM Dedup Whitepaper has now been updated to reflect the 20TB and 30TB numbers providing guidance on the hardware and configuration best practices required to achieve a 20TB or 30TB daily ingest. A specific TSM server configuration may see a smaller or no improvement if the workload exceeds hardware capabilities. Most customers will see some performance improvement and some customers are reporting significant improvement over prior levels. In addition, TSM 7.1 provides other code improvements that can benefit deduplication and node replication performance beyond what was initially provided in TSM 6.3.4.200. This includes further optimization to the expire/deref processing, improved deduplication handling of large objects via the SPLITLARGEOBJECTS node setting, and improvements to the identification of candidates for node replication which can improve the overall processing time for node replication. To achieve the best results when using TSM deduplication, we recommend 7.1 or 6.3.4.300. These levels contain an additional update to expire/deref processing documented by APAR IC97618 which benefits some customers. Link to the Whitepaper: https://www.ibm.com/developerworks/community/wikis/home?lang=en#/wiki/Tivoli%20Storage%20Manager/page/Effective%20Planning%20and%20Use%20of%20IBM%20Tivoli%20Storage%20Manager%20V6%20Deduplication Dave Canan Solutions Response Team (SRT) IBM Cloud & Smarter Infrastructure Office: (916)-723-2410 Office Hours 9:00-5:00 PST, Mon. - Fri. Office (Home Office): (916)-723-2410 E-mail: ddca...@us.ibm.com