Hey,
Are all the mappings done? If it's waiting for the last mapping to
finish, it can't copy the output of that last mapping, meaning the
average copy speed goes way down.
In other words, you are comparing the theoretical instantaneous copy
speed (1Gbps) versus the printout of the average speed, which includes
the amount of time for all the mappings to finish.
Transferring via FTP would be pointless (after all, FTP is a single
datastream of TCP and HTTP is ... a single datastream of TCP) and
there is nothing in hadoop-site.xml to tweak because the copy
processes are waiting for the source data to be created. The best
solution would be to add many more nodes to the cluster :)
Brian
On Dec 27, 2008, at 8:10 AM, d0ng wrote:
I meet the same problem,the copy is too slow :(
Genady
Hi,
I've built a Hadoop cluster from two computers( master and slave),
using
Hadoop 0.18.2/HBase 0.18.1.
While running Map-Reduce jobs on 5-10 GB files I've noticed that
reduce-copy
tasks from master to slave is taking too much time( ~30 minutes
each ) with
speed about 0.10 MB/s, despite the fact that master is connected to
slave
via 1GB switch, and I did /etc/hosts mapping using LAN
addresses(10.x.x.x).
My questions:
- Is there is a way to force hadoop to use ftp for example
for copy
of files?
- Is there is some hadoop-site.xml configuration to
improve copy
files performance?
I've tried to copy files with ftp ( master <-> slave computers )
and it
works with average speed 50Mb/s.
>From reduce task lists web page ( only slave tasks):
reduce > copy (67 of 69 at 0.89 MB/s) > : task on master
reduce > copy (29 of 69 at 0.10 MB/s) > : task on slave
Thanks in advance for any help or direction to search,
Genady