subject:"aggregateByKey vs combineByKey"

RE: aggregateByKey vs combineByKey

2016-01-05 Thread LINChen

+ Subject: aggregateByKey vs combineByKey From: mmistr...@gmail.com To: user@spark.apache.org Hi all i have the following dataSet kv = [(2,Hi), (1,i), (2,am), (1,a), (4,test), (6,s tring)] It's a simple list of tuples containing (word_length, word) What i wanted to do was to group the

Re: aggregateByKey vs combineByKey

2016-01-05 Thread Ted Yu

Looking at PairRDDFunctions.scala : def aggregateByKey[U: ClassTag](zeroValue: U, partitioner: Partitioner)(seqOp: (U, V) => U, combOp: (U, U) => U): RDD[(K, U)] = self.withScope { ... combineByKeyWithClassTag[U]((v: V) => cleanedSeqOp(createZero(), v), cleanedSeqOp, combOp, part

aggregateByKey vs combineByKey

2016-01-05 Thread Marco Mistroni

Hi all i have the following dataSet kv = [(2,Hi), (1,i), (2,am), (1,a), (4,test), (6,s tring)] It's a simple list of tuples containing (word_length, word) What i wanted to do was to group the result by key in order to have a result in the form [(word_length_1, [word1, word2, word3], word_length

Re: aggregateByKey vs combineByKey

2014-09-29 Thread David Rowe

Thanks Liquan, that was really helpful. On Mon, Sep 29, 2014 at 5:54 PM, Liquan Pei wrote: > Hi Dave, > > You can replace groupByKey with reduceByKey to improve performance in some > cases. reduceByKey performs map side combine which can reduce Network IO > and shuffle size where as groupByKey w

Re: aggregateByKey vs combineByKey

2014-09-29 Thread Liquan Pei

Hi Dave, You can replace groupByKey with reduceByKey to improve performance in some cases. reduceByKey performs map side combine which can reduce Network IO and shuffle size where as groupByKey will not perform map side combine. combineByKey is more general then aggregateByKey. Actually, the impl

aggregateByKey vs combineByKey

2014-09-29 Thread David Rowe

Hi All, After some hair pulling, I've reached the realisation that an operation I am currently doing via: myRDD.groupByKey.mapValues(func) should be done more efficiently using aggregateByKey or combineByKey. Both of these methods would do, and they seem very similar to me in terms of their func

RE: aggregateByKey vs combineByKey

Re: aggregateByKey vs combineByKey

aggregateByKey vs combineByKey

Re: aggregateByKey vs combineByKey

Re: aggregateByKey vs combineByKey

aggregateByKey vs combineByKey

6 matches

Site Navigation

Mail list logo

Footer information