----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/55681/ -----------------------------------------------------------
Review request for pig, Daniel Dai and Adam Szita. Bugs: PIG-4963 https://issues.apache.org/jira/browse/PIG-4963 Repository: pig Description ------- This patch adds a new type of join called bloom. It supports creating multiple bloom filters partitioned by hashcode of key for parallelism. Two new operators and one Packager implementations are added. POBuildBloomRearrageTez - Builds the bloom filter for one of the relations of the join on the map side or writes out the join keys based on the strategy BloomPackager - Used in the reducer to create or combine bloom filters and produces the final bloom filters. POBloomFilterRearrangeTez - Applies the bloom filters to other relations in the join and filters out data. More details in the documentation. Diffs ----- http://svn.apache.org/repos/asf/pig/trunk/src/docs/src/documentation/content/xdocs/basic.xml 1779329 http://svn.apache.org/repos/asf/pig/trunk/src/docs/src/documentation/content/xdocs/perf.xml 1779329 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/PigConfiguration.java 1779329 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java 1779329 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigCombiner.java 1779329 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/plans/EndOfAllInputSetter.java 1779329 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java 1779329 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/Packager.java 1779329 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java 1779329 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/TezCompiler.java 1779329 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/TezEdgeDescriptor.java 1779329 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/TezOperator.java 1779329 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/TezPOPackageAnnotator.java 1779329 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/operator/BloomPackager.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/operator/POBloomFilterRearrangeTez.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/operator/POBuildBloomRearrangeTez.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/operator/POLocalRearrangeTez.java 1779329 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/operator/POShuffleTezLoad.java 1779329 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/optimizer/CombinerOptimizer.java 1779329 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/optimizer/SecondaryKeyOptimizerTez.java 1779329 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LOJoin.java 1779329 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java 1779329 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java 1779329 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/tools/pigstats/ScriptState.java 1779329 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/tools/pigstats/tez/TezScriptState.java 1779329 http://svn.apache.org/repos/asf/pig/trunk/test/e2e/pig/build.xml 1779329 http://svn.apache.org/repos/asf/pig/trunk/test/e2e/pig/tests/join.conf PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/e2e/pig/tests/multiquery.conf 1779329 http://svn.apache.org/repos/asf/pig/trunk/test/e2e/pig/tests/orc.conf 1779329 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestEmptyInputDir.java 1779329 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-1-KeyToReducer.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-1.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-2-KeyToReducer.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-2.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-3-KeyToReducer.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-3.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-4-KeyToReducer.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-4.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-5-KeyToReducer.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-5.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-6-KeyToReducer.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-6.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-7-KeyToReducer.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-7.gld PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/tez/TestTezCompiler.java 1779329 Diff: https://reviews.apache.org/r/55681/diff/ Testing ------- Unit and e2e tests added for many different scenarios. Thanks, Rohini Palaniswamy