Indeed you will be sending 1.2TB over the wire. I think the common practice is to export a snapshot from local HDFS to remote HDFS (or HDFS-alike, such as S3). The idea is you get full bi-directional bandwidth (modulo head-of-rack switching) between all peers in both clusters.
On Thu, Apr 9, 2015 at 11:46 AM, Serega Sheypak <[email protected]> wrote: > Hi, > what is the reason to backup HDFS? It's distributed, reliable, > fault-tolerant, e.t.c. > NFS should expensive in order to keep TBs of data. > > > What problem you are trying to solve? > > > 2015-04-09 20:35 GMT+02:00 Afroz Ahmad <[email protected]>: > > > We are planning to use the snapshot feature that takes a backup of a > table > > with 1.2 TB of data. We are planning to export the data using > > ExportSnapshot and copy the resulting files to a NFS mount periodically. > > > > Out infrastructure team is very concerned about the amount of data that > > will be going over the wire and how long it will take > > > > This is just one table. There may be other tables in the future that we > > want to back up. > > > > So I wanted to get a sense of what others are doing with ExportSnapshot. > > What is the size of the tables that are backed up and whether the > concerns > > raised by our infra team are valid? > > > > > > Thanks > > > > Afroz > > >
