Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez

Jason Dere Tue, 30 Jun 2015 18:13:03 -0700


> On June 25, 2015, 11:38 p.m., Vikram Dixit Kumaraswamy wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java, line 672
> > <https://reviews.apache.org/r/34059/diff/1/?file=955661#file955661line672>
> >
> >     Move this to ExprNodeUtils?


I can move both versions of flattenExpr() to ExprNodeDescUtils, but I think 
flattenExprsIfNecessary() contains details specific to the MapJoinOperator


> On June 25, 2015, 11:38 p.m., Vikram Dixit Kumaraswamy wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java, line 669
> > <https://reviews.apache.org/r/34059/diff/5/?file=989880#file989880line669>
> >
> >     Nit: typo initializion -> initialization

will fix


> On June 25, 2015, 11:38 p.m., Vikram Dixit Kumaraswamy wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java, line 691
> > <https://reviews.apache.org/r/34059/diff/5/?file=989880#file989880line691>
> >
> >     May want to remove this comment in case we don't need it.

will fix


> On June 25, 2015, 11:38 p.m., Vikram Dixit Kumaraswamy wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java, line 705
> > <https://reviews.apache.org/r/34059/diff/5/?file=989880#file989880line705>
> >
> >     Incomplete comment needs fixing.

This comment is unnecessary, looks like remnants of an old version of 
ExprNodeDescUtils.resolveJoinKeysAsRSColumns() which I cut/pasted to create the 
initial version of flattenExpr(). I'll remove the comment.


> On June 25, 2015, 11:38 p.m., Vikram Dixit Kumaraswamy wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java, line 721
> > <https://reviews.apache.org/r/34059/diff/5/?file=989880#file989880line721>
> >
> >     Can you add comments as to why we need to do that?

Adding comments to flattenExpr()


> On June 25, 2015, 11:38 p.m., Vikram Dixit Kumaraswamy wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java, 
> > line 26
> > <https://reviews.apache.org/r/34059/diff/5/?file=989883#file989883line26>
> >
> >     Can you add comments for describing the need/use of this class?

Adding comment:

/**
 * Provides a key/values (note the plural values) interface out of a Key/Value 
reader,
 * needed by ReduceRecordSource when reading input from a key/value source.
 */


> On June 25, 2015, 11:38 p.m., Vikram Dixit Kumaraswamy wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java, 
> > line 24
> > <https://reviews.apache.org/r/34059/diff/5/?file=989884#file989884line24>
> >
> >     Comments please.

will add comment


> On June 25, 2015, 11:38 p.m., Vikram Dixit Kumaraswamy wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java, 
> > line 186
> > <https://reviews.apache.org/r/34059/diff/5/?file=989886#file989886line186>
> >
> >     Not sure if this is correct? Shouldn't this be true only if this is 
> > dynamically partitioned hash join?

Not totally sure, but I'm thinking this should always be done for vectorized 
rowObjectInspector. I think if any other reduce-side operation had to do any 
task-time initialization involving the vectorized row Object Inspectors, they 
would also hit the same issue hit in this patch, where there is a mismatch 
between the compile-time rowObjectInspector and the query-time 
rowObjectInspector (for vectorized queries only).

Another way to avoid this workaround (plus flattenExprs()) would be to simply 
bypass all of this unnecessary initialization in the vectorized MapJoin 
operator, since according to Matt McCline many of the object inspectors that 
are set up during MapJoin initialization are not actually used in the 
vectorized version. But right now I have no idea what the unnecessary parts are.


> On June 25, 2015, 11:38 p.m., Vikram Dixit Kumaraswamy wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java, 
> > line 657
> > <https://reviews.apache.org/r/34059/diff/5/?file=989889#file989889line657>
> >
> >     Is there ">>" in the code here?

No, it's HashSet< Operator<?> >


> On June 25, 2015, 11:38 p.m., Vikram Dixit Kumaraswamy wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java, line 695
> > <https://reviews.apache.org/r/34059/diff/5/?file=989880#file989880line695>
> >
> >     Rename function to flattenExprList makes code easier to read.

will change


> On June 25, 2015, 11:38 p.m., Vikram Dixit Kumaraswamy wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java, 
> > line 216
> > <https://reviews.apache.org/r/34059/diff/1/?file=955672#file955672line216>
> >
> >     Minreducers for conservative estimation

There is no MINREDUCERS config parameter. Discussed offline with Vikram, a more 
appropriate value is ReduceSinkDesc.getNumReducers().


- Jason


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34059/#review83846
-----------------------------------------------------------


On June 22, 2015, 7:30 p.m., Jason Dere wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34059/
> -----------------------------------------------------------
> 
> (Updated June 22, 2015, 7:30 p.m.)
> 
> 
> Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy.
> 
> 
> Bugs: HIVE-10673
>     https://issues.apache.org/jira/browse/HIVE-10673
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the 
> reducer are unsorted.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 27f68df 
>   itests/src/test/resources/testconfiguration.properties 7b7559a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 15cafdd 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 
> 545d7c6 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java 
> cdabe3a 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 
> e9bd44a 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java
>  4c8c4b1 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
> 5a87bd6 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 
> 4d84f0f 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java 
> bca91dd 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 11c1df6 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 
>   ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out 
> PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/34059/diff/
> 
> 
> Testing
> -------
> 
> q-file tests added
> 
> 
> Thanks,
> 
> Jason Dere
> 
>

Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez

Reply via email to