Ivan Andika created HDDS-12307:
----------------------------------
Summary: Realtime Cross-Region Bucket Replication
Key: HDDS-12307
URL: https://issues.apache.org/jira/browse/HDDS-12307
Project: Apache Ozone
Issue Type: New Feature
Reporter: Ivan Andika
Assignee: Ivan Andika
Currently, there are a few cross-regions (geo-replicated) DR solution for Ozone
bucket
* Run a periodic distcp from the source bucket to the target bucket
* Take a snapshot on the bucket and send it to the remote site DR sites
([https://ozone.apache.org/docs/edge/feature/snapshot.html])
There are pros and cons for the current approach
* Pros
** It is simpler: Setting up periodic jobs can be done quite easily (e.g.
using cronjobs)
** Distcp implementation will setup a map reduce jobs that will parallelize
the copy from the source and the cluster
** No additional components needed
* Cons
** It is not “realtime”: The freshness of the data depends on how frequent and
how fast the distcp runs
** It incurs significant overhead to the source cluster: It requires scanning
of all the files in the source cluster (possibly in the target cluster)
Cloudera Replication Manager
([https://docs.cloudera.com/replication-manager/1.5.4/replication-policies/topics/rm-pvce-understand-ozone-replication-policy.html)]
adds an incremental replication support after the initial bootstrap step by
taking the snapdiff between two snapshots. This is better since there are no
need to list all the keys under the bucket again, but it's not technically
realtime.
This ticket is track possible solutions for a realtime bucket async replication
between two clusters in different regions (with 100+ms latency).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]