Re: Filtering keys after map+combine

2015-02-19 Thread Debasish Das
Hi Sean, This is what I intend to do: "are you saying that you know a key should be filtered based on its value partway through the merge?" I should use combineByKey... Thanks. Deb On Thu, Feb 19, 2015 at 6:31 AM, Sean Owen wrote: > You have the keys before and after reduceByKey. You want t

Re: Filtering keys after map+combine

2015-02-19 Thread Debasish Das
I thought combiner comes from reduceByKey and not mapPartitions right...Let me dig deeper into the APIs On Thu, Feb 19, 2015 at 8:29 AM, Daniel Siegmann wrote: > I'm not sure what your use case is, but perhaps you could use > mapPartitions to reduce across the individual partitions and apply you

Re: Filtering keys after map+combine

2015-02-19 Thread Daniel Siegmann
I'm not sure what your use case is, but perhaps you could use mapPartitions to reduce across the individual partitions and apply your filtering. Then you can finish with a reduceByKey. On Thu, Feb 19, 2015 at 9:21 AM, Debasish Das wrote: > Hi, > > Before I send out the keys for network shuffle,

Re: Filtering keys after map+combine

2015-02-19 Thread Sean Owen
You have the keys before and after reduceByKey. You want to do something based on the key "within" reduceByKey? it just calls combineByKey, so you can use that method for lower-level control over the merging. Whether it's possible depends I suppose on what you mean to filter on. If it's just a pro

Filtering keys after map+combine

2015-02-19 Thread Debasish Das
Hi, Before I send out the keys for network shuffle, in reduceByKey after map + combine are done, I would like to filter the keys based on some threshold... Is there a way to get the key, value after map+combine stages so that I can run a filter on the keys ? Thanks. Deb