[ 
https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045411#comment-14045411
 ] 

Harish Butani commented on HIVE-7159:
-------------------------------------

Yes [~navis] you are right. Good catch. Sorry missed this, the diff was huge, 
this one unfortunately slipped through.
The reason for the regression is that PredicateTransitivePropagate looks at the 
FilterOperator below the ReduceSink. 
SemanticAnalyzer::genNotNullFilterForJoinSourcePlan was stacking another 
FilterOp for the not null check, so only that predicate was being applied 
transitively by PredicateTransitivePropagate. The fix is to add the following 
in SemanticAly line 2465
{code}
    if ( input instanceof FilterOperator ) {
      FilterOperator f = (FilterOperator) input;
      List<ExprNodeDesc> preds = new ArrayList<ExprNodeDesc>();
      preds.add(f.getConf().getPredicate());
      preds.add(filterPred);
      f.getConf().setPredicate(ExprNodeDescUtils.mergePredicates(preds));
      
      return input;
    }
{code}

Tested auto_join29.q with this change, predicate now contains 'key > 10'

Will file a jira and upload a patch tomorrow.

> For inner joins push a 'is not null predicate' to the join sources for every 
> non nullSafe join condition
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-7159
>                 URL: https://issues.apache.org/jira/browse/HIVE-7159
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Harish Butani
>            Assignee: Harish Butani
>             Fix For: 0.14.0
>
>         Attachments: HIVE-7159.1.patch, HIVE-7159.10.patch, 
> HIVE-7159.11.patch, HIVE-7159.2.patch, HIVE-7159.3.patch, HIVE-7159.4.patch, 
> HIVE-7159.5.patch, HIVE-7159.6.patch, HIVE-7159.7.patch, HIVE-7159.8.patch, 
> HIVE-7159.9.patch, HIVE-7159.addendum.patch
>
>
> A join B on A.x = B.y
> can be transformed to
> (A where x is not null) join (B where y is not null) on A.x = B.y
> Apart from avoiding shuffling null keyed rows it also avoids issues with 
> reduce-side skew when there are a lot of null values in the data.
> Thanks to [~gopalv] for the analysis and coming up with the solution.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to