Poor performance of the MaxtrixMultiplicationJob

optimusfan Fri, 14 Nov 2014 07:13:15 -0800

Hi-

I'm working on implementing a custom algorithm using the Mahout library. The 
algorithm requires matrix multiplication, which I saw was available at the 
object level (.times) as well as being implemented in the 
MatrixMultiplicationJob.  I am currently testing a step in the algorithm that 
requires me to multiply a 10x2.4m matrix by one that is 2.4mx2.4m.  The 
performance has been awful, taking 11-12 hours to complete.  This might be fine 
if it was the extent of the algorithm, but I will have multiple similarly sized 
steps, all of which will be repeated in a loop.


I dug into this further, looking at the job running on my Hadoop cluster 
(Google Cloud Compute, 3 nodes @ 16 GB each).  I noticed that the job appeared 
to only be running a single map and thus on a single node, as opposed to 
previous steps such as TransposeJob that ran multiple mappers and finished in a 
fraction of the time.  Researching it a bit further, I found a handful of 
concerning posts such as the two below:

Hadoop File Splits : CompositeInputFormat : Inner Join

  
             
Hadoop File Splits : CompositeInputFormat : Inner Join
I am using CompositeInputFormat to provide input to a hadoop job. The number of 
splits generated is the total number of files given as input to 
CompositeInputFormat...  
View on stackoverflow.com Preview by Yahoo  
  

MatrixMultiplicationJob runs with 1 mapper only ?

  
          
MatrixMultiplicationJob runs with 1 mapper only ?
Hi, I am trying to multiple dense matrix of size [100 x 100k]. The size of the 
file is 104MB and
with default block sizeof 64MB only 2 blocks are getting created.   
View on mail-archives.apache.org Preview by Yahoo  
  

So, my questions are as follows.  Is the MatrixMultiplicationJob truly limited 
to only being able to be run on a single node?  If so, it seems fairly useless. 
 And if so, what is the recommended way to do decently sized multiplication 
such as I require?

Poor performance of the MaxtrixMultiplicationJob

Reply via email to