Hadoop fs -put operates on a single thread at a time, and writes the data to HDFS in order. Depending on the connectivity between the filer/NFS server and the datanodes it may be difficult to saturate that connection. Which is the only way to really speed things up. If there are multiple files, then like was said in other posts you can increase the thread count of transfers and do a better job of getting the data into HDFS faster. Just be careful, like was stated before, that the NN can keep up with all of the data being transferred.
--Bobby Evans On 1/24/12 11:16 PM, "Praveen Sripati" <praveensrip...@gmail.com> wrote: > If it is divided up into several files and you can mount your NFS directory on each of the datanodes. Just curious, how will this help. Praveen On Wed, Jan 25, 2012 at 12:39 AM, Robert Evans <ev...@yahoo-inc.com> wrote: > If it is divided up into several files and you can mount your NFS > directory on each of the datanodes, you could possibly use distcp to do it. > I have never tried using distcp for this, but it should work. Or you can > write your own streaming Map/Reduce script that does more or less the same > thing as distcp and will take as input the list of files to copy, and will > do a hadoop fs -put for each file having it come from NFS. > > --Bobby Evans > > On 1/24/12 12:49 AM, "rajmca2002" <rajmca2...@gmail.com> wrote: > > > > Hi, > > I have TB of Data in NFS i need to move this data to hdfs. I have used > hadoop put command to do the same, but it resulted in taking hours to place > the file in HDFS, Is there any good approach to move large files to hdfs. > > Please reply asap. > -- > View this message in context: > http://old.nabble.com/Moving-TB-of-data-from-NFS-to-HDFS-tp33193061p33193061.html > Sent from the Hadoop core-dev mailing list archive at Nabble.com. > > >