[jira] [Commented] (HIVE-12462) DPP: DPP optimizers need to run on the TS predicate not FIL

Gunther Hagleitner (JIRA) Sat, 12 Dec 2015 00:52:19 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-12462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15054151#comment-15054151
 ]


Gunther Hagleitner commented on HIVE-12462:
-------------------------------------------

I've looked into this some more. This should have already worked. There are 
tests in dynamic_partition_pruner.q that check for this type of join condition 
(with udf). However these tests no longer work. HIVE-11634 broke them. 
[~hsubramaniyan]/[~jpullokkaran] can you please take a look at this? I don't 
think the patch proposed here is the right fix and should probably be reverted.

HIVE-11634 changes the golden file of the dynamic_partition_pruner.q - it 
effectively disables the optimization and I'm not sure why. The synthetic 
predicate in dpp is of the form (col IN (reducesink operator)) which for some 
reason gets lost in HIVE-11634.

HIVE-11634 also seem to leave you with different expressions in the table scan 
and the filter and I'm thinking this is wrong as well (i.e.: the fix in this 
patch shouldn't work either).

> DPP: DPP optimizers need to run on the TS predicate not FIL 
> ------------------------------------------------------------
>
>                 Key: HIVE-12462
>                 URL: https://issues.apache.org/jira/browse/HIVE-12462
>             Project: Hive
>          Issue Type: Bug
>          Components: Tez
>    Affects Versions: 2.0.0
>            Reporter: Gopal V
>            Assignee: Gopal V
>            Priority: Critical
>             Fix For: 2.0.0
>
>         Attachments: HIVE-12462.02.patch, HIVE-12462.1.patch
>
>
> HIVE-11398 + HIVE-11791, the partition-condition-remover became more 
> effective.
> This removes predicates from the FilterExpression which involve partition 
> columns, causing a miss for dynamic-partition pruning if the DPP relies on 
> FilterDesc.
> The TS desc will have the correct predicate in that condition.
> {code}
> $hdt$_0:$hdt$_1:a
>   TableScan (TS_2)
>     alias: a
>     filterExpr: (((account_id = 22) and year(dt) is not null) and (year(dt)) 
> IN (RS[6])) (type: boolean)
>     Filter Operator (FIL_20)
>       predicate: ((account_id = 22) and year(dt) is not null) (type: boolean)
>       Select Operator (SEL_4)
>         expressions: dt (type: date)
>         outputColumnNames: _col1
>         Reduce Output Operator (RS_8)
>           key expressions: year(_col1) (type: int)
>           sort order: +
>           Map-reduce partition columns: year(_col1) (type: int)
>           Join Operator (JOIN_9)
>             condition map:
>                  Inner Join 0 to 1
>             keys:
>               0 year(_col1) (type: int)
>               1 year(_col1) (type: int)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12462) DPP: DPP optimizers need to run on the TS predicate not FIL

Reply via email to