I would like to clarify something. Matei mentioned that in Spark 1.0 groupBy
returns an (Key, Iterable[Value]) instead of (Key, Seq[Value]). Does this
also automatically assure us that the whole Iterable[Value] is not in fact
stored in memory? That is to say, with 1.0, will it be possible to do
groupByKey().values.map(x => while(x.hasNext) ... ) assuming x :
Iterable[Value] is larger than the RAM on a single machine? Or will this be
possible later, in subsequent versions?

Could you please propose a workaround for this for the meantime? I'm out of
ideas.

Thanks,
Nilesh



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/all-values-for-a-key-must-fit-in-memory-tp6342p6791.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Reply via email to