1. Actually, I disagree that combineByKey requires that all values be
held in memory for a key. Only the use case groupByKey does that, whereas
reduceByKey, foldByKey, and the generic combineByKey do not necessarily
make that requirement. If your combine logic really shrinks the result
OK so in Java - pardon the verbosity I might say something like the code
below
but I face the following issues
1) I need to store all values in memory as I run combineByKey - it I could
return an RDD which consumed values that would be great but I don't know
how to do that -
2) In my version of the
So sorry about teasing you with the Scala. But the method is there in Java
too, I just checked.
On Fri, Sep 19, 2014 at 2:02 PM, Victor Tso-Guillen wrote:
> It might not be the same as a real hadoop reducer, but I think it would
> accomplish the same. Take a look at:
>
> import org.apache.spark.
It might not be the same as a real hadoop reducer, but I think it would
accomplish the same. Take a look at:
import org.apache.spark.SparkContext._
// val rdd: RDD[(K, V)]
// def zero(value: V): S
// def reduce(agg: S, value: V): S
// def merge(agg1: S, agg2: S): S
val reducedUnsorted: RDD[(K, S)]