OK, this makes sense. When people see Out of Memory problems they naturally try
to give more to the process throwing the exception but what is often happening
is that you have given too much to the collection of other processes on the
machine so there is not enough to go around and the allocatio
I was able to make it working by setting the executor memory to 10g
and with -D:spark.dynamicAllocation.enabled=true :
mahout spark-rowsimilarity --input hdfs:/indata/row-similarity.tsv
--output rowsim-out --omitStrength --sparkExecutorMem 10g --master
yarn-client -D:spark.dynamicAllocation.enable
Hello,
I have the same problem described above using spark-rowsimilarity.
I have a ~65k lines input file (each row with less than 300 items),
and I run the job on a small cluster with 1 master and 2 workers, each
machine has 15GB of RAM.
I tried to increase executor and driver memory:
--sparkExecut