Re: Running Wordcount on large file stucks and throws OOM exception

2014-09-03 Thread Zhan Zhang
In word count, you don’t need much driver memory, unless you do collect, but it is not recommended. val file = sc.textFile("hdfs://sandbox.hortonworks.com:8020/tmp/data") val counts = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _) counts.saveAsTextFile("hdfs://sa

Re: Running Wordcount on large file stucks and throws OOM exception

2014-08-26 Thread motte1988
Hello, it's me again. Now I've got an explanation for the behaviour. It seems that the driver memory is not large enough to hold the whole result set of saveAsTextFile In-Memory. And then OOM occures. I test it with a filter-step that removes KV-pairs with WordCount smaller 100,000. So now the job