There have been numerous questions and posts regarding the 10x
improvements with TSM data deduplication seen when upgrading to TSM V7.1.
IBM would like to explain this statement further.

The up to 10x improvement was based on a comparison of prior published
maximum daily dedup capability for a single server (e.g. 3TB daily) versus
the new published maximum daily dedup capability (e.g. 30TB).   In this
context, "dedup capability" includes ingest of client data on the server
and deduplication of that data. In addition, it includes the
post-processing of creating a second copy of the data, reclamation, and
expiration/deref.  The baseline was  V6.2 and was published in the
original Effective Planning and Use of IBM TSM Dedup Whitepaper in August
2012. The baseline encompassed all daily ingest and deduplication
processing plus creation of a second copy using storage pool backup.

The new compare which achieved the up to 10x improvement over the original
3TB occurred against V6.3.4.200 and was calculated using client side dedup
with optimized hardware (e.g. SSD) and improved server code in several
paths  (reclamation and expiration among others).  The maximum daily
server side deduplication capability was improved to 20TB through a
combination of the optimized code paths and different overlapping of some
server processes during the 24 hour workload.  In both the 20TB and 30TB
cases, the storage pool backup was replaced with node replication to
achieve the second copy.   In IBM labs, we saw improvements on
pre-existing test configurations with code changes alone, although not
10x.  The combination of improved hardware, code updates, and the
transition to node replication combined with client side deduplication,
created the ability to achieve 30TB. The 10x improvement comes from the
compare of the 3TB original published daily ingest recommendation versus
the updated 30TB published daily ingest recommendation.  The Effective
Planning and Use of IBM TSM Dedup Whitepaper has now been updated to
reflect the 20TB and 30TB numbers providing guidance on the hardware and
configuration best practices required to achieve a 20TB or 30TB daily
ingest. A specific TSM server configuration may see a smaller or no
improvement if the workload exceeds hardware capabilities.  Most customers
will see some performance improvement and some customers are reporting
significant improvement over prior levels.
In addition, TSM 7.1 provides other code improvements that can benefit
deduplication and node replication performance beyond what was initially
provided in TSM 6.3.4.200.  This includes further optimization to the
expire/deref processing, improved deduplication handling of large objects
via the SPLITLARGEOBJECTS node setting, and improvements to the
identification of candidates for node replication which can improve the
overall processing time for node replication.  To achieve the best results
when using TSM deduplication, we recommend 7.1 or 6.3.4.300.   These
levels contain an additional update to expire/deref processing documented
by APAR IC97618 which benefits some customers.


Link to the Whitepaper:
https://www.ibm.com/developerworks/community/wikis/home?lang=en#/wiki/Tivoli%20Storage%20Manager/page/Effective%20Planning%20and%20Use%20of%20IBM%20Tivoli%20Storage%20Manager%20V6%20Deduplication



Dave Canan
Solutions Response Team (SRT)
IBM Cloud & Smarter Infrastructure
Office: (916)-723-2410 Office Hours 9:00-5:00 PST, Mon. - Fri.
Office (Home Office): (916)-723-2410
E-mail: ddca...@us.ibm.com

Reply via email to