The totals that you're mentioning:  eg: "Yet we can only replicate around
3TB daily when we backup around 7TB.", are these deduplicated replication
totals or non-deduplication non-compressed totals?

Are you replicating deduplicated data?

We've been replicating successfully about 10TB of nightly non-deduplicated
data across our two datacenters successfully for years now and it's only
been getting better.  We get about 60% deduplication so what we're really
transferring over the wire is about 4 TB of data.  It used to take 6 to 8
hours to do the replication of that much data across our network, but we
changed the DB storage to essentially SSD (as part of a separate project)
and the code base for TSM replication has gotten more stable that it only
takes about two-three hours to replicate it now.

Some differences that I notice from what you've provided, is that we're
doing 1 to 1 replication where we have 4 servers, 2 are primary and 2
replicate to a dedicated replica for that primary server.  We're now moving
those replicas to the cloud and seeing if we can use the AWS EC2 and S3
instances to be our DR servers (plus the AWS TSM instance will be our AWS
TSM target (yes, there are still reasons to have backups in the cloud)).
That's still early in production so we haven't taxed that replication
network much yet.

We also use dedicated disk for both DB's and stgpools (still on file-pools,
so we're not doing any protect stgpool commands).  We don't have another
ethernet network to traverse for the file-pool traffic.  That dedicated
storage is backed by old 8GB FC switches on a pair of Dell MD arrays.  It's
nice to have that low-latency backbone while we still got it.  We're on TSM
8.latest and there might have been some performance bugs at the 7.1 level
if I remember correctly.

The one thing I did notice is that the more you can spreadout your I/O
across your storage arrays (whether it's DB or STGPOOL targets) the better
the performance.   For example, the setup for our TSM server has 16
filesystems for the database and 16 mountpoints for the filepool
directories.  For your Isilon backend do you see that as a bottleneck at
all?  Is the TSM server pushing load to as many of those Isilon nodes as
possible?   Or is it really the enumeration of the replication data that
takes a long time (that's possibly a DB bottleneck).  Lots of questions
than answers for me but I hope I pointed you in the right direction.

Thanks and good luck!

On Thu, Apr 26, 2018 at 2:46 PM, Zoltan Forray <> wrote:

> As we get deeper into Replication and my boss wants to use it more and more
> as an offsite recovery platform.
> As we try to reach "best practices" of replicating everything, we are
> finding this desire to be difficult if not impossible to achieve due to the
> resource demands.
> Total we want to eventually replicate is around 700TB from 5-source servers
> to 1-target server which is dedicated to replication.
> So the big question is, can this be done?
> We recently rebuilt the offsite target server to as big as we could afford
> ($38K).  It has 256GB of RAM.  64-threads of CPU. Storage is primarily
> 500TB of ISILON/NFS. Connectivity is via quad 10G (2-for IP traffic from
> source servers and 2-for ISILON/NFS).
> Yet we can only replicate around 3TB daily when we backup around 7TB.
> Looking for suggestions/thoughts/experiences?
> All boxes are RHEL Linux and
> --
> *Zoltan Forray*
> Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
> Xymon Monitor Administrator
> VMware Administrator
> Virginia Commonwealth University
> UCC/Office of Technology Services
> - 804-828-4807
> Don't be a phishing victim - VCU and other reputable organizations will
> never use email to request that you reply with your password, social
> security number or confidential personal information. For more details
> visit

Reply via email to