I would like to clarify something. Matei mentioned that in Spark 1.0 groupBy returns an (Key, Iterable[Value]) instead of (Key, Seq[Value]). Does this also automatically assure us that the whole Iterable[Value] is not in fact stored in memory? That is to say, with 1.0, will it be possible to do groupByKey().values.map(x => while(x.hasNext) ... ) assuming x : Iterable[Value] is larger than the RAM on a single machine? Or will this be possible later, in subsequent versions?
Could you please propose a workaround for this for the meantime? I'm out of ideas. Thanks, Nilesh -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/all-values-for-a-key-must-fit-in-memory-tp6342p6791.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.