Can you tell which nodes were doing the computation in each case?

Date: Wed, 27 Aug 2014 20:29:38 +0530
Subject: Execution time increasing with increase of cluster size
From: [email protected]
To: [email protected]

Hi,
I've written a simple scala program which reads a file on HDFS (which is a 
delimited file having 100 fields and 1 million rows), splits each row with 
delimiter, deduces hashcode of each field, makes new rows with these hashcodes 
and writes these rows back to HDFS. Code attached.

When I run this on spark cluster of 2 nodes (these 2 nodes also act as HDFS 
cluster) it took about 35sec to complete. Then I increased the cluster to 4 
nodes (additional nodes are not part of HDFS cluster) and submitted the same 
job. I was expecting a decrease in the execution time but instead it took 3 
times more time (1.6 min) to complete. Attached snapshots of the execution 
summary.

Both the times I've set executor memory to 6GB which is available in all the 
nodes.
What am I'm missing here? Do I need to do any additional configuration when 
increasing the cluster size?

~Sarath


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]                     
                  

Reply via email to