Hi raj,

If you have all data on NFS mounted disk, meaning on single machine, then
your upload will be limited by network bandwidth. You can try running dfs
-put in multiple parallel threads for distinct data sets, you might be able
to utilise network bandwidth to its maximum(take care not to have too many
threads otherwise namenode handlers will be busy all the time making dfs
unresponsive). I dont see any other way to make it faster, making data
upload faster require data source to be present at distributed locations
which is not true in this case.

-Ajit


On Wed, Jan 25, 2012 at 10:46 AM, Praveen Sripati
<praveensrip...@gmail.com>wrote:

> > If it is divided up into several files and you can mount your NFS
> directory on each of the datanodes.
>
> Just curious, how will this help.
>
> Praveen
>
> On Wed, Jan 25, 2012 at 12:39 AM, Robert Evans <ev...@yahoo-inc.com>
> wrote:
>
> > If it is divided up into several files and you can mount your NFS
> > directory on each of the datanodes, you could possibly use distcp to do
> it.
> >  I have never tried using distcp for this, but it should work.  Or you
> can
> > write your own streaming Map/Reduce script that does more or less the
> same
> > thing as distcp and will take as input the list of files to copy, and
> will
> > do a hadoop fs -put for each file having it come from NFS.
> >
> > --Bobby Evans
> >
> > On 1/24/12 12:49 AM, "rajmca2002" <rajmca2...@gmail.com> wrote:
> >
> >
> >
> > Hi,
> >
> > I have TB of Data in NFS i need to move this data to hdfs. I have used
> > hadoop put command to do the same, but it resulted in taking hours to
> place
> > the file in HDFS, Is there any good approach to move large files to hdfs.
> >
> > Please reply asap.
> > --
> > View this message in context:
> >
> http://old.nabble.com/Moving-TB-of-data-from-NFS-to-HDFS-tp33193061p33193061.html
> > Sent from the Hadoop core-dev mailing list archive at Nabble.com.
> >
> >
> >
>

Reply via email to