Hi raj, If you have all data on NFS mounted disk, meaning on single machine, then your upload will be limited by network bandwidth. You can try running dfs -put in multiple parallel threads for distinct data sets, you might be able to utilise network bandwidth to its maximum(take care not to have too many threads otherwise namenode handlers will be busy all the time making dfs unresponsive). I dont see any other way to make it faster, making data upload faster require data source to be present at distributed locations which is not true in this case.
-Ajit On Wed, Jan 25, 2012 at 10:46 AM, Praveen Sripati <praveensrip...@gmail.com>wrote: > > If it is divided up into several files and you can mount your NFS > directory on each of the datanodes. > > Just curious, how will this help. > > Praveen > > On Wed, Jan 25, 2012 at 12:39 AM, Robert Evans <ev...@yahoo-inc.com> > wrote: > > > If it is divided up into several files and you can mount your NFS > > directory on each of the datanodes, you could possibly use distcp to do > it. > > I have never tried using distcp for this, but it should work. Or you > can > > write your own streaming Map/Reduce script that does more or less the > same > > thing as distcp and will take as input the list of files to copy, and > will > > do a hadoop fs -put for each file having it come from NFS. > > > > --Bobby Evans > > > > On 1/24/12 12:49 AM, "rajmca2002" <rajmca2...@gmail.com> wrote: > > > > > > > > Hi, > > > > I have TB of Data in NFS i need to move this data to hdfs. I have used > > hadoop put command to do the same, but it resulted in taking hours to > place > > the file in HDFS, Is there any good approach to move large files to hdfs. > > > > Please reply asap. > > -- > > View this message in context: > > > http://old.nabble.com/Moving-TB-of-data-from-NFS-to-HDFS-tp33193061p33193061.html > > Sent from the Hadoop core-dev mailing list archive at Nabble.com. > > > > > > >