[ https://issues.apache.org/jira/browse/PIG-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16511706#comment-16511706 ]
Rohini Palaniswamy commented on PIG-5342: ----------------------------------------- Comments: 1) Bloom join is also ideal in cases of right outer join with smaller dataset on the right which is not supported by replicated join. 2) edge.setCombinerInMap(true); and edge.setCombinerInReducer(true); is redundant. 3) edge.partitionerClass = BloomFilterPartitioner.class; should be only for the reducer case. Same for key and value types. 4) combineBloomOp is not used anymore and should be removed. 5) resuleWithCombiner -> resultWithCombiner 6) Can avoid the new NullableTuple() in bloomWriter.write(new NullableIntWritable(i), new NullableTuple(tuple)); > Add setting to turn off bloom join combiner > ------------------------------------------- > > Key: PIG-5342 > URL: https://issues.apache.org/jira/browse/PIG-5342 > Project: Pig > Issue Type: Sub-task > Reporter: Satish Subhashrao Saley > Assignee: Satish Subhashrao Saley > Priority: Major > Attachments: PIG-5342-1.patch > > > 1) Need a new setting pig.bloomjoin.nocombiner to turn off combiner for bloom > join. When the keys are all unique, the combiner is unnecessary overhead. > 2) Mention in documentation that bloom join is also ideal in cases of right > outer join with smaller dataset on the right. Replicate join only supports > left outer join. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)