Re: Poor performance of the MaxtrixMultiplicationJob

Pat Ferrel Sat, 15 Nov 2014 10:47:39 -0800

You may want to switch to the Scala DSL if you are planning more linear 
algebra. The DSL runs on Spark and so is much faster that the older Hadoop code 
but is also on top of a linear algebra optimizer. The type if thing you mention 
below is a few lines that can be run interactively in the Mahout Scala shell or 
can be put in your own driver just as easily.

http://mahout.apache.org/users/sparkbindings/home.html
http://mahout.apache.org/users/sparkbindings/play-with-shell.html

On Nov 14, 2014, at 7:41 AM, optimusfan <[email protected]> wrote:

Thanks to Yahoo mail for messing up the links in my message above. Let's try
this again:

http://stackoverflow.com/questions/8654200/hadoop-file-splits-compositeinputformat-inner-join

http://mail-archives.apache.org/mod_mbox/mahout-user/201301.mbox/%3c50cfd234cc7d3a4ea1e8910d3866f700095256f...@nda-hclc-evs02.hclc.corp.hcl.in%3E

On Friday, November 14, 2014 9:12 AM, optimusfan <[email protected]>
wrote:

Hi-

I'm working on implementing a custom algorithm using the Mahout library. The
algorithm requires matrix multiplication, which I saw was available at the
object level (.times) as well as being implemented in the
MatrixMultiplicationJob. I am currently testing a step in the algorithm that
requires me to multiply a 10x2.4m matrix by one that is 2.4mx2.4m. The
performance has been awful, taking 11-12 hours to complete. This might be fine
if it was the extent of the algorithm, but I will have multiple similarly sized
steps, all of which will be repeated in a loop.

I dug into this further, looking at the job running on my Hadoop cluster
(Google Cloud Compute, 3 nodes @ 16 GB each). I noticed that the job appeared
to only be running a single map and thus on a single node, as opposed to
previous steps such as TransposeJob that ran multiple mappers and finished in a
fraction of the time. Researching it a bit further, I found a handful of
concerning posts such as the two below:

Hadoop File Splits : CompositeInputFormat : Inner Join

Hadoop File Splits : CompositeInputFormat : Inner Join
I am using CompositeInputFormat to provide input to a hadoop job. The number of
splits generated is the total number of files given as input to
CompositeInputFormat...
View on stackoverflow.com Preview by Yahoo

MatrixMultiplicationJob runs with 1 mapper only ?

MatrixMultiplicationJob runs with 1 mapper only ?
Hi, I am trying to multiple dense matrix of size [100 x 100k]. The size of the
file is 104MB and
with default block sizeof 64MB only 2 blocks are getting created.
View on mail-archives.apache.org Preview by Yahoo

So, my questions are as follows. Is the MatrixMultiplicationJob truly limited
to only being able to be run on a single node? If so, it seems fairly useless.
And if so, what is the recommended way to do decently sized multiplication
such as I require?

Re: Poor performance of the MaxtrixMultiplicationJob

Reply via email to