There is a bug in mahout 0.10.0 that you can fix if you are able to build from source. Get the source tar for 0.10.0, not the current master.
Got to https://github.com/apache/mahout/blob/mahout-0.10.x/spark/src/main/scala/org/apache/mahout/drivers/TextDelimitedReaderWriter.scala#L157 remove the line that says: interactions.collect() See this Jira https://issues.apache.org/jira/browse/MAHOUT-1707 There is one other thing that can cause this and is fixed by increasing you client JVM heap space but try the above first. BTW setting the executor memory twice, is not necessary. On May 13, 2015, at 2:21 AM, Xavier Rampino <[email protected]> wrote: Hello, I've tried spark-rowsimilarity with out-of-the-box setup (downloaded mahout distribution and spark, and set up the PATH), and I stumble upon a Java Heap space error. My input file is ~100MB. It seems the various parameters I tried to give won't change this. I do : ~/mahout-distribution-0.10.0/bin/mahout spark-rowsimilarity --input ~/query_result.tsv --output ~/work/result -sem 24g -D:spark.executor.memory=24g Do I just need to input more memory, or is there another step I can do to solve this ?
