Re: Moving TB of data from NFS to HDFS

Robert Evans Wed, 25 Jan 2012 07:04:21 -0800

Hadoop fs -put operates on a single thread at a time, and writes the data to 
HDFS in order.  Depending on the connectivity between the filer/NFS server and 
the datanodes it may be difficult to saturate that connection.  Which is the 
only way to really speed things up.  If there are multiple files, then like was 
said in other posts you can increase the thread count of transfers and do a 
better job of getting the data into HDFS faster.  Just be careful, like was 
stated before, that the NN can keep up with all of the data being transferred.


--Bobby Evans

On 1/24/12 11:16 PM, "Praveen Sripati" <praveensrip...@gmail.com> wrote:

> If it is divided up into several files and you can mount your NFS
directory on each of the datanodes.

Just curious, how will this help.

Praveen

On Wed, Jan 25, 2012 at 12:39 AM, Robert Evans <ev...@yahoo-inc.com> wrote:

> If it is divided up into several files and you can mount your NFS
> directory on each of the datanodes, you could possibly use distcp to do it.
>  I have never tried using distcp for this, but it should work.  Or you can
> write your own streaming Map/Reduce script that does more or less the same
> thing as distcp and will take as input the list of files to copy, and will
> do a hadoop fs -put for each file having it come from NFS.
>
> --Bobby Evans
>
> On 1/24/12 12:49 AM, "rajmca2002" <rajmca2...@gmail.com> wrote:
>
>
>
> Hi,
>
> I have TB of Data in NFS i need to move this data to hdfs. I have used
> hadoop put command to do the same, but it resulted in taking hours to place
> the file in HDFS, Is there any good approach to move large files to hdfs.
>
> Please reply asap.
> --
> View this message in context:
> http://old.nabble.com/Moving-TB-of-data-from-NFS-to-HDFS-tp33193061p33193061.html
> Sent from the Hadoop core-dev mailing list archive at Nabble.com.
>
>
>

Re: Moving TB of data from NFS to HDFS

Reply via email to