If memory is not enough, OutOfMemory exception should occur, but nothing in
driver's log.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/something-about-rdd-collect-tp16451p16461.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Thanks rxin,
I still have a doubt about collect().
Word's number before reduceByKey() is about 200 million, and after
reduceByKey() it decreases to 18 million.
Memory for driver is initialized 15GB, then I print out
"runtime.freeMemory()" before reduceByKey(), it indicates 13GB free memory.
Hi Randy,
collect essentially transfers all the data to the driver node. You definitely
wouldn’t want to collect 200 million words. It is a pretty large number and you
can run out of memory on your driver with that much data.
--
Reynold Xin
On October 14, 2014 at 9:26:13 PM, randylu (randyl.
My code is as follows:
*documents.flatMap(case words => words.map(w => (w, 1))).reduceByKey(_ +
_).collect()*
In driver's log, reduceByKey() is finished, but collect() seems always in
run, just can't be finished.
In additional, there are about 200,000,000 words needs to be collected. Is
it t