Indeed you will be sending 1.2TB over the wire. I think the common practice
is to export a snapshot from local HDFS to remote HDFS (or HDFS-alike, such
as S3). The idea is you get full bi-directional bandwidth (modulo
head-of-rack switching) between all peers in both clusters.

On Thu, Apr 9, 2015 at 11:46 AM, Serega Sheypak <[email protected]>
wrote:

> Hi,
> what is the reason to backup HDFS? It's distributed, reliable,
> fault-tolerant, e.t.c.
> NFS should expensive in order to keep TBs of data.
>
>
> What problem you are trying to solve?
>
>
> 2015-04-09 20:35 GMT+02:00 Afroz Ahmad <[email protected]>:
>
> > We are planning to use the snapshot feature that takes a backup of a
> table
> > with 1.2 TB of data. We are planning to export the data using
> > ExportSnapshot and copy the resulting files to a NFS mount periodically.
> >
> > Out infrastructure team is very concerned about the amount of data that
> > will be going over the wire and how long it will take
> >
> > This is just one table. There may be other tables in the future that we
> > want to back up.
> >
> > So I wanted to get a sense of what others are doing with ExportSnapshot.
> > What is the size of the tables that are backed up and whether the
> concerns
> > raised by our infra team are valid?
> >
> >
> > Thanks
> >
> > Afroz
> >
>

Reply via email to