[ 
https://issues.apache.org/jira/browse/PIG-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy resolved PIG-5260.
-------------------------------------
    Resolution: Invalid

This is totally invalid. Writing something when you totally lack sleep is a bad 
thing. Had totally interchanged the left side load vertex with the join vertex. 
It can still be valid for the case when the left side is a intermediate reducer 
and result of group by on same key as the join key. But not worth the effort.

> Separate bloom filter for each reducer of the join
> --------------------------------------------------
>
>                 Key: PIG-5260
>                 URL: https://issues.apache.org/jira/browse/PIG-5260
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Rohini Palaniswamy
>
>    Currently bloom join allows specifying the number of bloom filters and all 
> of them are broadcast to each join vertex. The bloom filter partition logic 
> is joinkey hashcode % num_filters. The reducer partition logic is joinkey 
> hashcode % num_reducers. If we made the number of bloom filters equal to 
> number of reducers in the join we can just broadcast bloom filter  0 to 
> reducer 0, bloom filter 1 to reducer 1 and so on. one-one edge will most 
> likely prevent auto-reduce parallelism from being applied for the 
> scatter-gather edge. So need to see if we need a custom one-one broadcast 
> edge for this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to