Hi Sean,

This is what I intend to do:

"are you saying that you know a key should be filtered based on its value
partway through the merge?"

I should use combineByKey...

Thanks.
Deb


On Thu, Feb 19, 2015 at 6:31 AM, Sean Owen <so...@cloudera.com> wrote:

> You have the keys before and after reduceByKey. You want to do
> something based on the key "within" reduceByKey? it just calls
> combineByKey, so you can use that method for lower-level control over
> the merging.
>
> Whether it's possible depends I suppose on what you mean to filter on.
> If it's just a property of the key, can't you just filter before
> reduceByKey? If it's a property of the key's value, don't you need to
> wait for the reduction to finish? or are you saying that you know a
> key should be filtered based on its value partway through the merge?
>
> I suppose you can use combineByKey to create a mergeValue function
> that changes an input type A into some other Option[B]; you output
> None if your criteria is reached, and your combine function returns
> None if either argument is None? it doesn't save 100% of the work but
> it may mean you only shuffle (key,None) for some keys if the map-side
> combine already worked out that the key would be filtered.
>
> And then after, run a flatMap or something to make Option[B] into B.
>
> On Thu, Feb 19, 2015 at 2:21 PM, Debasish Das <debasish.da...@gmail.com>
> wrote:
> > Hi,
> >
> > Before I send out the keys for network shuffle, in reduceByKey after map
> +
> > combine are done, I would like to filter the keys based on some
> threshold...
> >
> > Is there a way to get the key, value after map+combine stages so that I
> can
> > run a filter on the keys ?
> >
> > Thanks.
> > Deb
>

Reply via email to