Hey,

Are all the mappings done? If it's waiting for the last mapping to finish, it can't copy the output of that last mapping, meaning the average copy speed goes way down.

In other words, you are comparing the theoretical instantaneous copy speed (1Gbps) versus the printout of the average speed, which includes the amount of time for all the mappings to finish.

Transferring via FTP would be pointless (after all, FTP is a single datastream of TCP and HTTP is ... a single datastream of TCP) and there is nothing in hadoop-site.xml to tweak because the copy processes are waiting for the source data to be created. The best solution would be to add many more nodes to the cluster :)

Brian

On Dec 27, 2008, at 8:10 AM, d0ng wrote:

I meet the same problem,the copy is too slow :(

Genady
Hi,


I've built a Hadoop cluster from two computers( master and slave), using
Hadoop 0.18.2/HBase 0.18.1.

While running Map-Reduce jobs on 5-10 GB files I've noticed that reduce-copy tasks from master to slave is taking too much time( ~30 minutes each ) with speed about 0.10 MB/s, despite the fact that master is connected to slave via 1GB switch, and I did /etc/hosts mapping using LAN addresses(10.x.x.x).


My questions:
- Is there is a way to force hadoop to use ftp for example for copy
of files?
- Is there is some hadoop-site.xml configuration to improve copy
files performance?


I've tried to copy files with ftp ( master <-> slave computers ) and it
works with average speed 50Mb/s.


>From reduce task lists web page ( only slave tasks):



reduce > copy (67 of 69 at 0.89 MB/s) >   : task on master
reduce > copy (29 of 69 at 0.10 MB/s) >   : task on slave


Thanks in advance for any help or direction to search,


Genady





Reply via email to