I had a quick question for the group. I wanted to backup data to a DR hdfs cluster.
With "HDFS-8828 - Utilize Snapshot diff report to build diff copy list in distcp" - this makes it much easier to get the copyFileListing and keep the DR side n sync. I know there are filters which can be applied to Distcp to exclude the copy of certain files. But the question I had was that suppose I have hbase running on the cluster as well. HBase handles its replication separately, so I want to still snapshot the root directory and exclude /hbase from the Distcp. Suppose we do the following: - take a snapshot on the source side (s0) and then distcp that data to the destination cluster with the exclude filter on /hbase. - then create a snapshot on the target cluster after shipping the data also (s0) - We make changes on the source side both in /hbase and elsewhere. - take another snapshot on the source side (s1) and distcp that over with the exclude filter on /hbase I know Distcp checks the snapshot on the target cluster to see if anything has changed. This seems it would not work as the before we do any copy we check to see if anything has changed between snapshots and it would look as if things changed on the target side Is this correct or am I missing something? Thank you rahul