Thanks to Yahoo mail for messing up the links in my message above.  Let's try 
this again:

http://stackoverflow.com/questions/8654200/hadoop-file-splits-compositeinputformat-inner-join

 
http://mail-archives.apache.org/mod_mbox/mahout-user/201301.mbox/%3c50cfd234cc7d3a4ea1e8910d3866f700095256f...@nda-hclc-evs02.hclc.corp.hcl.in%3E

 


On Friday, November 14, 2014 9:12 AM, optimusfan <[email protected]> 
wrote:
 


Hi-

I'm working on implementing a custom algorithm using the Mahout library. The 
algorithm requires matrix multiplication, which I saw was available at the 
object level (.times) as well as being implemented in the 
MatrixMultiplicationJob.  I am currently testing a step in the algorithm that 
requires me to multiply a 10x2.4m matrix by one that is 2.4mx2.4m.  The 
performance has been awful, taking 11-12 hours to complete.  This might be fine 
if it was the extent of the algorithm, but I will have multiple similarly sized 
steps, all of which will be repeated in a loop.

I dug into this further, looking at the job running on my Hadoop cluster 
(Google Cloud Compute, 3 nodes @ 16 GB each).  I noticed that the job appeared 
to only be running a single map and thus on a single node, as opposed to 
previous steps such as TransposeJob that ran multiple mappers and finished in a 
fraction of the time.  Researching it a bit further, I found a handful of 
concerning posts such as the two below:

Hadoop File Splits : CompositeInputFormat : Inner Join

  
            
Hadoop File Splits : CompositeInputFormat : Inner Join
I am using CompositeInputFormat to provide input to a hadoop job. The number of 
splits generated is the total number of files given as input to 
CompositeInputFormat...  
View on stackoverflow.com Preview by Yahoo  
  

MatrixMultiplicationJob runs with 1 mapper only ?

  
          
MatrixMultiplicationJob runs with 1 mapper only ?
Hi, I am trying to multiple dense matrix of size [100 x 100k]. The size of the 
file is 104MB and
with default block sizeof 64MB only 2 blocks are getting created.  
View on mail-archives.apache.org Preview by Yahoo  
  

So, my questions are as follows.  Is the MatrixMultiplicationJob truly limited 
to only being able to be run on a single node?  If so, it seems fairly useless. 
 And if so, what is the recommended way to do decently sized multiplication 
such as I require?

Reply via email to