subject:"Re\: Reproducing the function of a Hadoop Reducer"

Re: Reproducing the function of a Hadoop Reducer

2014-09-20 Thread Victor Tso-Guillen

1. Actually, I disagree that combineByKey requires that all values be held in memory for a key. Only the use case groupByKey does that, whereas reduceByKey, foldByKey, and the generic combineByKey do not necessarily make that requirement. If your combine logic really shrinks the result

Re: Reproducing the function of a Hadoop Reducer

2014-09-20 Thread Steve Lewis

OK so in Java - pardon the verbosity I might say something like the code below but I face the following issues 1) I need to store all values in memory as I run combineByKey - it I could return an RDD which consumed values that would be great but I don't know how to do that - 2) In my version of the

Re: Reproducing the function of a Hadoop Reducer

2014-09-19 Thread Victor Tso-Guillen

So sorry about teasing you with the Scala. But the method is there in Java too, I just checked. On Fri, Sep 19, 2014 at 2:02 PM, Victor Tso-Guillen wrote: > It might not be the same as a real hadoop reducer, but I think it would > accomplish the same. Take a look at: > > import org.apache.spark.

Re: Reproducing the function of a Hadoop Reducer

2014-09-19 Thread Victor Tso-Guillen

It might not be the same as a real hadoop reducer, but I think it would accomplish the same. Take a look at: import org.apache.spark.SparkContext._ // val rdd: RDD[(K, V)] // def zero(value: V): S // def reduce(agg: S, value: V): S // def merge(agg1: S, agg2: S): S val reducedUnsorted: RDD[(K, S)]

Re: Reproducing the function of a Hadoop Reducer

Re: Reproducing the function of a Hadoop Reducer

Re: Reproducing the function of a Hadoop Reducer

Re: Reproducing the function of a Hadoop Reducer

4 matches

Site Navigation

Mail list logo

Footer information