[jira] [Commented] (HIVE-8168) With dynamic partition enabled fact table selectivity is not taken into account when generating the physical plan (Use CBO cardinality using physical plan generation)

Mostafa Mokhtar (JIRA) Fri, 03 Oct 2014 20:40:39 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-8168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158924#comment-14158924
 ]


Mostafa Mokhtar commented on HIVE-8168:
---------------------------------------

The filter on the PK side should not be on the join column as conjunction rules 
will kick in and apply a filter on the join column from the FK side. 
Can you add tests with filter on columns other than s_store_sk? 

Thanks 
Mostafa 

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


> With dynamic partition enabled fact table selectivity is not taken into 
> account when generating the physical plan (Use CBO cardinality using physical 
> plan generation)
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-8168
>                 URL: https://issues.apache.org/jira/browse/HIVE-8168
>             Project: Hive
>          Issue Type: Bug
>          Components: Tez
>    Affects Versions: 0.14.0
>            Reporter: Mostafa Mokhtar
>            Assignee: Prasanth J
>            Priority: Critical
>              Labels: performance
>             Fix For: vectorization-branch, 0.14.0
>
>         Attachments: HIVE-8168.1.patch
>
>
> When calculating estimate row counts & data size during physical plan 
> generation in StatsRulesProcFactory doesn't know that there will be dynamic 
> partition pruning and it is hard to know how many partitions will qualify at 
> runtime, as a result with Dynamic partition pruning enabled a query 32 can 
> run with 570 compared to 70 tasks with dynamic partition pruning disabled and 
> actual partition filters on the fact table.
> The long term solution for this issue is to use the cardinality estimates 
> from CBO as it takes into account join selectivity and such, estimate from 
> CBO won't address the number of the tasks used for the partitioned table but 
> they will address the incorrect number of tasks used for the concequent 
> reducers where the majority of the slowdown is coming from.
> Plan dynamic partition pruning on 
> {code}
>    Map 5 
>             Map Operator Tree:
>                 TableScan
>                   alias: ss
>                   filterExpr: ss_store_sk is not null (type: boolean)
>                   Statistics: Num rows: 550076554 Data size: 47370018896 
> Basic stats: COMPLETE Column stats: NONE
>                   Filter Operator
>                     predicate: ss_store_sk is not null (type: boolean)
>                     Statistics: Num rows: 275038277 Data size: 23685009448 
> Basic stats: COMPLETE Column stats: NONE
>                     Map Join Operator
>                       condition map:
>                            Inner Join 0 to 1
>                       condition expressions:
>                         0 {ss_store_sk} {ss_net_profit}
>                         1 
>                       keys:
>                         0 ss_sold_date_sk (type: int)
>                         1 d_date_sk (type: int)
>                       outputColumnNames: _col6, _col21
>                       input vertices:
>                         1 Map 1
>                       Statistics: Num rows: 302542112 Data size: 26053511168 
> Basic stats: COMPLETE Column stats: NONE
>                       Map Join Operator
>                         condition map:
>                              Inner Join 0 to 1
>                         condition expressions:
>                           0 {_col21}
>                           1 {s_county} {s_state}
>                         keys:
>                           0 _col6 (type: int)
>                           1 s_store_sk (type: int)
>                         outputColumnNames: _col21, _col80, _col81
>                         input vertices:
>                           1 Map 2
>                         Statistics: Num rows: 332796320 Data size: 
> 28658862080 Basic stats: COMPLETE Column stats: NONE
>                         Map Join Operator
>                           condition map:
>                                Left Semi Join 0 to 1
>                           condition expressions:
>                             0 {_col21} {_col80} {_col81}
>                             1 
>                           keys:
>                             0 _col81 (type: string)
>                             1 _col0 (type: string)
>                           outputColumnNames: _col21, _col80, _col81
>                           input vertices:
>                             1 Reducer 11
>                           Statistics: Num rows: 366075968 Data size: 
> 31524749312 Basic stats: COMPLETE Column stats: NONE
>                           Select Operator
>                             expressions: _col81 (type: string), _col80 (type: 
> string), _col21 (type: float)
>                             outputColumnNames: _col81, _col80, _col21
>                             Statistics: Num rows: 366075968 Data size: 
> 31524749312 Basic stats: COMPLETE Column stats: NONE
>                             Group By Operator
>                               aggregations: sum(_col21)
>                               keys: _col81 (type: string), _col80 (type: 
> string), '0' (type: string)
>                               mode: hash
>                               outputColumnNames: _col0, _col1, _col2, _col3
>                               Statistics: Num rows: 1098227904 Data size: 
> 94574247936 Basic stats: COMPLETE Column stats: NONE
>                               Reduce Output Operator
>                                 key expressions: _col0 (type: string), _col1 
> (type: string), _col2 (type: string)
>                                 sort order: +++
>                                 Map-reduce partition columns: _col0 (type: 
> string), _col1 (type: string), _col2 (type: string)
>                                 Statistics: Num rows: 1098227904 Data size: 
> 94574247936 Basic stats: COMPLETE Column stats: NONE
>                                 value expressions: _col3 (type: double)
> {code}
> Plan snippet with partition pruning off and explicit partition filters
> {code}
>     Map 5 
>             Map Operator Tree:
>                 TableScan
>                   alias: ss
>                   filterExpr: ((ss_sold_date_sk is not null and ss_store_sk 
> is not null) and ss_sold_date BETWEEN '1999-06-01' AND '2000-05-31') (type: 
> boolean)
>                   Statistics: Num rows: 110339135 Data size: 4817453454 Basic 
> stats: COMPLETE Column stats: NONE
>                   Filter Operator
>                     predicate: (ss_sold_date_sk is not null and ss_store_sk 
> is not null) (type: boolean)
>                     Statistics: Num rows: 27584784 Data size: 1204363374 
> Basic stats: COMPLETE Column stats: NONE
>                     Map Join Operator
>                       condition map:
>                            Inner Join 0 to 1
>                       condition expressions:
>                         0 {ss_store_sk} {ss_net_profit}
>                         1 
>                       keys:
>                         0 ss_sold_date_sk (type: int)
>                         1 d_date_sk (type: int)
>                       outputColumnNames: _col7, _col22
>                       input vertices:
>                         1 Map 1
>                       Statistics: Num rows: 30343264 Data size: 1324799744 
> Basic stats: COMPLETE Column stats: NONE
>                       Map Join Operator
>                         condition map:
>                              Inner Join 0 to 1
>                         condition expressions:
>                           0 {_col22}
>                           1 {s_county} {s_state}
>                         keys:
>                           0 _col7 (type: int)
>                           1 s_store_sk (type: int)
>                         outputColumnNames: _col22, _col81, _col82
>                         input vertices:
>                           1 Map 2
>                         Statistics: Num rows: 33377592 Data size: 1457279744 
> Basic stats: COMPLETE Column stats: NONE
>                         Map Join Operator
>                           condition map:
>                                Left Semi Join 0 to 1
>                           condition expressions:
>                             0 {_col22} {_col81} {_col82}
>                             1 
>                           keys:
>                             0 _col82 (type: string)
>                             1 _col0 (type: string)
>                           outputColumnNames: _col22, _col81, _col82
>                           input vertices:
>                             1 Reducer 11
>                           Statistics: Num rows: 36715352 Data size: 
> 1603007744 Basic stats: COMPLETE Column stats: NONE
>                           Select Operator
>                             expressions: _col82 (type: string), _col81 (type: 
> string), _col22 (type: float)
>                             outputColumnNames: _col82, _col81, _col22
>                             Statistics: Num rows: 36715352 Data size: 
> 1603007744 Basic stats: COMPLETE Column stats: NONE
>                             Group By Operator
>                               aggregations: sum(_col22)
>                               keys: _col82 (type: string), _col81 (type: 
> string), '0' (type: string)
>                               mode: hash
>                               outputColumnNames: _col0, _col1, _col2, _col3
>                               Statistics: Num rows: 36715352 Data size: 
> 1603007744 Basic stats: COMPLETE Column stats: NONE
>                               Reduce Output Operator
>                                 key expressions: _col0 (type: string), _col1 
> (type: string), _col2 (type: string)
>                                 sort order: +++
>                                 Map-reduce partition columns: _col0 (type: 
> string), _col1 (type: string), _col2 (type: string)
>                                 Statistics: Num rows: 36715352 Data size: 
> 1603007744 Basic stats: COMPLETE Column stats: NONE
>                                 value expressions: _col3 (type: double)
> {code}
> Plan snippet with partition pruning on and fetch column stats also on , in 
> this case row estimate a
> {code}
>       Map 5 
>             Map Operator Tree:
>                 TableScan
>                   alias: ss
>                   filterExpr: ss_store_sk is not null (type: boolean)
>                   Statistics: Num rows: 550076554 Data size: 47370018896 
> Basic stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: ss_store_sk is not null (type: boolean)
>                     Statistics: Num rows: 537120379 Data size: 4195767284 
> Basic stats: COMPLETE Column stats: COMPLETE
>                     Map Join Operator
>                       condition map:
>                            Inner Join 0 to 1
>                       condition expressions:
>                         0 {ss_store_sk} {ss_net_profit}
>                         1 
>                       keys:
>                         0 ss_sold_date_sk (type: int)
>                         1 d_date_sk (type: int)
>                       outputColumnNames: _col6, _col21
>                       input vertices:
>                         1 Map 1
>                       Statistics: Num rows: 36524 Data size: 292192 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                       Map Join Operator
>                         condition map:
>                              Inner Join 0 to 1
>                         condition expressions:
>                           0 {_col21}
>                           1 {s_county} {s_state}
>                         keys:
>                           0 _col6 (type: int)
>                           1 s_store_sk (type: int)
>                         outputColumnNames: _col21, _col80, _col81
>                         input vertices:
>                           1 Map 2
>                         Statistics: Num rows: 45017 Data size: 8508213 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                         Map Join Operator
>                           condition map:
>                                Left Semi Join 0 to 1
>                           condition expressions:
>                             0 {_col21} {_col80} {_col81}
>                             1 
>                           keys:
>                             0 _col81 (type: string)
>                             1 _col0 (type: string)
>                           outputColumnNames: _col21, _col80, _col81
>                           input vertices:
>                             1 Reducer 11
>                           Statistics: Num rows: 49518 Data size: 9359035 
> Basic stats: COMPLETE Column stats: NONE
>                           Select Operator
>                             expressions: _col81 (type: string), _col80 (type: 
> string), _col21 (type: float)
>                             outputColumnNames: _col81, _col80, _col21
>                             Statistics: Num rows: 49518 Data size: 9359035 
> Basic stats: COMPLETE Column stats: NONE
>                             Group By Operator
>                               aggregations: sum(_col21)
>                               keys: _col81 (type: string), _col80 (type: 
> string), '0' (type: string)
>                               mode: hash
>                               outputColumnNames: _col0, _col1, _col2, _col3
>                               Statistics: Num rows: 148554 Data size: 
> 28077105 Basic stats: COMPLETE Column stats: NONE
>                               Reduce Output Operator
>                                 key expressions: _col0 (type: string), _col1 
> (type: string), _col2 (type: string)
>                                 sort order: +++
>                                 Map-reduce partition columns: _col0 (type: 
> string), _col1 (type: string), _col2 (type: string)
>                                 Statistics: Num rows: 148554 Data size: 
> 28077105 Basic stats: COMPLETE Column stats: NONE
>                                 value expressions: _col3 (type: double)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8168) With dynamic partition enabled fact table selectivity is not taken into account when generating the physical plan (Use CBO cardinality using physical plan generation)

Reply via email to