[ https://issues.apache.org/jira/browse/PIG-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rohini Palaniswamy resolved PIG-5260. ------------------------------------- Resolution: Invalid This is totally invalid. Writing something when you totally lack sleep is a bad thing. Had totally interchanged the left side load vertex with the join vertex. It can still be valid for the case when the left side is a intermediate reducer and result of group by on same key as the join key. But not worth the effort. > Separate bloom filter for each reducer of the join > -------------------------------------------------- > > Key: PIG-5260 > URL: https://issues.apache.org/jira/browse/PIG-5260 > Project: Pig > Issue Type: New Feature > Reporter: Rohini Palaniswamy > > Currently bloom join allows specifying the number of bloom filters and all > of them are broadcast to each join vertex. The bloom filter partition logic > is joinkey hashcode % num_filters. The reducer partition logic is joinkey > hashcode % num_reducers. If we made the number of bloom filters equal to > number of reducers in the join we can just broadcast bloom filter 0 to > reducer 0, bloom filter 1 to reducer 1 and so on. one-one edge will most > likely prevent auto-reduce parallelism from being applied for the > scatter-gather edge. So need to see if we need a custom one-one broadcast > edge for this. -- This message was sent by Atlassian JIRA (v6.4.14#64029)