There is a bug in mahout 0.10.0 that you can fix if you are able to build from 
source. Get the source tar for 0.10.0, not the current master.

Got to 
https://github.com/apache/mahout/blob/mahout-0.10.x/spark/src/main/scala/org/apache/mahout/drivers/TextDelimitedReaderWriter.scala#L157

remove the line that says: interactions.collect()

See this Jira https://issues.apache.org/jira/browse/MAHOUT-1707

There is one other thing that can cause this and is fixed by increasing you 
client JVM heap space but try the above first.

BTW setting the executor memory twice, is not necessary.


On May 13, 2015, at 2:21 AM, Xavier Rampino <[email protected]> wrote:

Hello,

I've tried spark-rowsimilarity with out-of-the-box setup (downloaded mahout
distribution and spark, and set up the PATH), and I stumble upon a Java
Heap space error. My input file is ~100MB. It seems the various parameters
I tried to give won't change this. I do :

~/mahout-distribution-0.10.0/bin/mahout spark-rowsimilarity --input
~/query_result.tsv --output ~/work/result -sem 24g
-D:spark.executor.memory=24g

Do I just need to input more memory, or is there another step I can do to
solve this ?

Reply via email to