[jira] [Commented] (HIVE-9069) Simplify filter predicates for CBO

Laljo John Pullokkaran (JIRA) Wed, 20 May 2015 19:10:44 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-9069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553515#comment-14553515
 ]


Laljo John Pullokkaran commented on HIVE-9069:
----------------------------------------------

1. Step #3 in calcite planner seems like a mix of constant prop & common filter 
extraction (please adjust the comment)
2. We need to run common filter extraction after transitive inference. 
Transitive inference may add additional predicates.
   Ideally this should be added to transitive inference bucket so that it fires 
on new inferences and vice versa; but may be we need to be cautious about perf 
impact.
3. This optimization can be done only to Disjunction of Conjunctions. But the 
code seems to be not doing this or may be i am miss reading it.


> Simplify filter predicates for CBO
> ----------------------------------
>
>                 Key: HIVE-9069
>                 URL: https://issues.apache.org/jira/browse/HIVE-9069
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>    Affects Versions: 0.14.0
>            Reporter: Mostafa Mokhtar
>            Assignee: Jesus Camacho Rodriguez
>             Fix For: 0.14.1
>
>         Attachments: HIVE-9069.01.patch, HIVE-9069.02.patch, 
> HIVE-9069.03.patch, HIVE-9069.04.patch, HIVE-9069.05.patch, 
> HIVE-9069.06.patch, HIVE-9069.07.patch, HIVE-9069.08.patch, 
> HIVE-9069.08.patch, HIVE-9069.patch
>
>
> Simplify predicates for disjunctive predicates so that can get pushed down to 
> the scan.
> Looks like this is still an issue, some of the filters can be pushed down to 
> the scan.
> {code}
> set hive.cbo.enable=true
> set hive.stats.fetch.column.stats=true
> set hive.exec.dynamic.partition.mode=nonstrict
> set hive.tez.auto.reducer.parallelism=true
> set hive.auto.convert.join.noconditionaltask.size=320000000
> set hive.exec.reducers.bytes.per.reducer=100000000
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager
> set hive.support.concurrency=false
> set hive.tez.exec.print.summary=true
> explain  
> select  substr(r_reason_desc,1,20) as r
>        ,avg(ws_quantity) wq
>        ,avg(wr_refunded_cash) ref
>        ,avg(wr_fee) fee
>  from web_sales, web_returns, web_page, customer_demographics cd1,
>       customer_demographics cd2, customer_address, date_dim, reason 
>  where web_sales.ws_web_page_sk = web_page.wp_web_page_sk
>    and web_sales.ws_item_sk = web_returns.wr_item_sk
>    and web_sales.ws_order_number = web_returns.wr_order_number
>    and web_sales.ws_sold_date_sk = date_dim.d_date_sk and d_year = 1998
>    and cd1.cd_demo_sk = web_returns.wr_refunded_cdemo_sk 
>    and cd2.cd_demo_sk = web_returns.wr_returning_cdemo_sk
>    and customer_address.ca_address_sk = web_returns.wr_refunded_addr_sk
>    and reason.r_reason_sk = web_returns.wr_reason_sk
>    and
>    (
>     (
>      cd1.cd_marital_status = 'M'
>      and
>      cd1.cd_marital_status = cd2.cd_marital_status
>      and
>      cd1.cd_education_status = '4 yr Degree'
>      and 
>      cd1.cd_education_status = cd2.cd_education_status
>      and
>      ws_sales_price between 100.00 and 150.00
>     )
>    or
>     (
>      cd1.cd_marital_status = 'D'
>      and
>      cd1.cd_marital_status = cd2.cd_marital_status
>      and
>      cd1.cd_education_status = 'Primary' 
>      and
>      cd1.cd_education_status = cd2.cd_education_status
>      and
>      ws_sales_price between 50.00 and 100.00
>     )
>    or
>     (
>      cd1.cd_marital_status = 'U'
>      and
>      cd1.cd_marital_status = cd2.cd_marital_status
>      and
>      cd1.cd_education_status = 'Advanced Degree'
>      and
>      cd1.cd_education_status = cd2.cd_education_status
>      and
>      ws_sales_price between 150.00 and 200.00
>     )
>    )
>    and
>    (
>     (
>      ca_country = 'United States'
>      and
>      ca_state in ('KY', 'GA', 'NM')
>      and ws_net_profit between 100 and 200  
>     )
>     or
>     (
>      ca_country = 'United States'
>      and
>      ca_state in ('MT', 'OR', 'IN')
>      and ws_net_profit between 150 and 300  
>     )
>     or
>     (
>      ca_country = 'United States'
>      and
>      ca_state in ('WI', 'MO', 'WV')
>      and ws_net_profit between 50 and 250  
>     )
>    )
> group by r_reason_desc
> order by r, wq, ref, fee
> limit 100
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
>       Edges:
>         Map 9 <- Map 1 (BROADCAST_EDGE)
>         Reducer 3 <- Map 13 (SIMPLE_EDGE), Map 2 (SIMPLE_EDGE)
>         Reducer 4 <- Map 9 (SIMPLE_EDGE), Reducer 3 (SIMPLE_EDGE)
>         Reducer 5 <- Map 14 (SIMPLE_EDGE), Reducer 4 (SIMPLE_EDGE)
>         Reducer 6 <- Map 10 (SIMPLE_EDGE), Map 11 (BROADCAST_EDGE), Map 12 
> (BROADCAST_EDGE), Reducer 5 (SIMPLE_EDGE)
>         Reducer 7 <- Reducer 6 (SIMPLE_EDGE)
>         Reducer 8 <- Reducer 7 (SIMPLE_EDGE)
>       DagName: mmokhtar_20141111161818_f5fd23ba-d783-4b13-8507-7faa65851798:1
>       Vertices:
>         Map 1 
>             Map Operator Tree:
>                 TableScan
>                   alias: web_page
>                   filterExpr: wp_web_page_sk is not null (type: boolean)
>                   Statistics: Num rows: 4602 Data size: 2696178 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: wp_web_page_sk is not null (type: boolean)
>                     Statistics: Num rows: 4602 Data size: 18408 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: wp_web_page_sk (type: int)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 4602 Data size: 18408 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: int)
>                         sort order: +
>                         Map-reduce partition columns: _col0 (type: int)
>                         Statistics: Num rows: 4602 Data size: 18408 Basic 
> stats: COMPLETE Column stats: COMPLETE
>             Execution mode: vectorized
>         Map 10 
>             Map Operator Tree:
>                 TableScan
>                   alias: customer_address
>                   filterExpr: ((ca_country = 'United States') and 
> ca_address_sk is not null) (type: boolean)
>                   Statistics: Num rows: 40000000 Data size: 40595195284 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: ((ca_country = 'United States') and 
> ca_address_sk is not null) (type: boolean)
>                     Statistics: Num rows: 20000000 Data size: 3740000000 
> Basic stats: COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: ca_address_sk (type: int), ca_state (type: 
> string)
>                       outputColumnNames: _col0, _col1
>                       Statistics: Num rows: 20000000 Data size: 1800000000 
> Basic stats: COMPLETE Column stats: COMPLETE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: int)
>                         sort order: +
>                         Map-reduce partition columns: _col0 (type: int)
>                         Statistics: Num rows: 20000000 Data size: 1800000000 
> Basic stats: COMPLETE Column stats: COMPLETE
>                         value expressions: _col1 (type: string)
>             Execution mode: vectorized
>         Map 11 
>             Map Operator Tree:
>                 TableScan
>                   alias: date_dim
>                   filterExpr: ((d_year = 1998) and d_date_sk is not null) 
> (type: boolean)
>                   Statistics: Num rows: 73049 Data size: 81741831 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: ((d_year = 1998) and d_date_sk is not null) 
> (type: boolean)
>                     Statistics: Num rows: 652 Data size: 5216 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: d_date_sk (type: int)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 652 Data size: 2608 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: int)
>                         sort order: +
>                         Map-reduce partition columns: _col0 (type: int)
>                         Statistics: Num rows: 652 Data size: 2608 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                       Select Operator
>                         expressions: _col0 (type: int)
>                         outputColumnNames: _col0
>                         Statistics: Num rows: 652 Data size: 2608 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                         Group By Operator
>                           keys: _col0 (type: int)
>                           mode: hash
>                           outputColumnNames: _col0
>                           Statistics: Num rows: 326 Data size: 1304 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                           Dynamic Partitioning Event Operator
>                             Target Input: web_sales
>                             Partition key expr: ws_sold_date_sk
>                             Statistics: Num rows: 326 Data size: 1304 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                             Target column: ws_sold_date_sk
>                             Target Vertex: Map 9
>             Execution mode: vectorized
>         Map 12 
>             Map Operator Tree:
>                 TableScan
>                   alias: reason
>                   filterExpr: r_reason_sk is not null (type: boolean)
>                   Statistics: Num rows: 72 Data size: 14400 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: r_reason_sk is not null (type: boolean)
>                     Statistics: Num rows: 72 Data size: 7272 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: r_reason_sk (type: int), r_reason_desc 
> (type: string)
>                       outputColumnNames: _col0, _col1
>                       Statistics: Num rows: 72 Data size: 7272 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: int)
>                         sort order: +
>                         Map-reduce partition columns: _col0 (type: int)
>                         Statistics: Num rows: 72 Data size: 7272 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                         value expressions: _col1 (type: string)
>             Execution mode: vectorized
>         Map 13 
>             Map Operator Tree:
>                 TableScan
>                   alias: web_returns
>                   filterExpr: (((((wr_refunded_cdemo_sk is not null and 
> wr_item_sk is not null) and wr_order_number is not null) and 
> wr_returning_cdemo_sk is not null) and wr_refunded_addr_sk is not null) and 
> wr_reason_sk is not null) (type: boolean)
>                   Statistics: Num rows: 2062802370 Data size: 185695406284 
> Basic stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: (((((wr_refunded_cdemo_sk is not null and 
> wr_item_sk is not null) and wr_order_number is not null) and 
> wr_returning_cdemo_sk is not null) and wr_refunded_addr_sk is not null) and 
> wr_reason_sk is not null) (type: boolean)
>                     Statistics: Num rows: 1875154722 Data size: 58944640412 
> Basic stats: COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: wr_item_sk (type: int), 
> wr_refunded_cdemo_sk (type: int), wr_refunded_addr_sk (type: int), 
> wr_returning_cdemo_sk (type: int), wr_reason_sk (type: int), wr_order_number 
> (type: int), wr_fee (type: float), wr_refunded_cash (type: float)
>                       outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
> _col5, _col6, _col7
>                       Statistics: Num rows: 1875154722 Data size: 58944640412 
> Basic stats: COMPLETE Column stats: COMPLETE
>                       Reduce Output Operator
>                         key expressions: _col1 (type: int)
>                         sort order: +
>                         Map-reduce partition columns: _col1 (type: int)
>                         Statistics: Num rows: 1875154722 Data size: 
> 58944640412 Basic stats: COMPLETE Column stats: COMPLETE
>                         value expressions: _col0 (type: int), _col2 (type: 
> int), _col3 (type: int), _col4 (type: int), _col5 (type: int), _col6 (type: 
> float), _col7 (type: float)
>             Execution mode: vectorized
>         Map 14 
>             Map Operator Tree:
>                 TableScan
>                   alias: cd1
>                   filterExpr: ((cd_demo_sk is not null and cd_marital_status 
> is not null) and cd_education_status is not null) (type: boolean)
>                   Statistics: Num rows: 1920800 Data size: 718379200 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: ((cd_demo_sk is not null and cd_marital_status 
> is not null) and cd_education_status is not null) (type: boolean)
>                     Statistics: Num rows: 1920800 Data size: 351506400 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: cd_demo_sk (type: int), cd_marital_status 
> (type: string), cd_education_status (type: string)
>                       outputColumnNames: _col0, _col1, _col2
>                       Statistics: Num rows: 1920800 Data size: 351506400 
> Basic stats: COMPLETE Column stats: COMPLETE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: int), _col1 (type: 
> string), _col2 (type: string)
>                         sort order: +++
>                         Map-reduce partition columns: _col0 (type: int), 
> _col1 (type: string), _col2 (type: string)
>                         Statistics: Num rows: 1920800 Data size: 351506400 
> Basic stats: COMPLETE Column stats: COMPLETE
>             Execution mode: vectorized
>         Map 2 
>             Map Operator Tree:
>                 TableScan
>                   alias: cd1
>                   filterExpr: ((cd_demo_sk is not null and cd_marital_status 
> is not null) and cd_education_status is not null) (type: boolean)
>                   Statistics: Num rows: 1920800 Data size: 718379200 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: ((cd_demo_sk is not null and cd_marital_status 
> is not null) and cd_education_status is not null) (type: boolean)
>                     Statistics: Num rows: 1920800 Data size: 351506400 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: cd_demo_sk (type: int), cd_marital_status 
> (type: string), cd_education_status (type: string)
>                       outputColumnNames: _col0, _col1, _col2
>                       Statistics: Num rows: 1920800 Data size: 351506400 
> Basic stats: COMPLETE Column stats: COMPLETE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: int)
>                         sort order: +
>                         Map-reduce partition columns: _col0 (type: int)
>                         Statistics: Num rows: 1920800 Data size: 351506400 
> Basic stats: COMPLETE Column stats: COMPLETE
>                         value expressions: _col1 (type: string), _col2 (type: 
> string)
>             Execution mode: vectorized
>         Map 9 
>             Map Operator Tree:
>                 TableScan
>                   alias: web_sales
>                   filterExpr: ((ws_web_page_sk is not null and ws_item_sk is 
> not null) and ws_order_number is not null) (type: boolean)
>                   Statistics: Num rows: 21594638446 Data size: 2850189889652 
> Basic stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: ((ws_web_page_sk is not null and ws_item_sk is 
> not null) and ws_order_number is not null) (type: boolean)
>                     Statistics: Num rows: 21591939929 Data size: 604541956128 
> Basic stats: COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: ws_item_sk (type: int), ws_web_page_sk 
> (type: int), ws_order_number (type: int), ws_quantity (type: int), 
> ws_sales_price (type: float), ws_net_profit (type: float), ws_sold_date_sk 
> (type: int)
>                       outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
> _col5, _col6
>                       Statistics: Num rows: 21591939929 Data size: 
> 604541956128 Basic stats: COMPLETE Column stats: COMPLETE
>                       Map Join Operator
>                         condition map:
>                              Inner Join 0 to 1
>                         condition expressions:
>                           0 {_col0} {_col2} {_col3} {_col4} {_col5} {_col6}
>                           1 
>                         keys:
>                           0 _col1 (type: int)
>                           1 _col0 (type: int)
>                         outputColumnNames: _col0, _col2, _col3, _col4, _col5, 
> _col6
>                         input vertices:
>                           1 Map 1
>                         Statistics: Num rows: 21591939072 Data size: 
> 518206537728 Basic stats: COMPLETE Column stats: COMPLETE
>                         Reduce Output Operator
>                           key expressions: _col0 (type: int), _col2 (type: 
> int)
>                           sort order: ++
>                           Map-reduce partition columns: _col0 (type: int), 
> _col2 (type: int)
>                           Statistics: Num rows: 21591939072 Data size: 
> 518206537728 Basic stats: COMPLETE Column stats: COMPLETE
>                           value expressions: _col3 (type: int), _col4 (type: 
> float), _col5 (type: float), _col6 (type: int)
>             Execution mode: vectorized
>         Reducer 3 
>             Reduce Operator Tree:
>               Merge Join Operator
>                 condition map:
>                      Inner Join 0 to 1
>                 condition expressions:
>                   0 {VALUE._col0} {VALUE._col1} {VALUE._col2} {VALUE._col3} 
> {VALUE._col4} {VALUE._col5} {VALUE._col6}
>                   1 {VALUE._col0} {VALUE._col1}
>                 outputColumnNames: _col0, _col2, _col3, _col4, _col5, _col6, 
> _col7, _col9, _col10
>                 Statistics: Num rows: 1875154688 Data size: 373155782912 
> Basic stats: COMPLETE Column stats: COMPLETE
>                 Select Operator
>                   expressions: _col0 (type: int), _col10 (type: string), 
> _col2 (type: int), _col3 (type: int), _col4 (type: int), _col5 (type: int), 
> _col6 (type: float), _col7 (type: float), _col9 (type: string)
>                   outputColumnNames: _col0, _col10, _col2, _col3, _col4, 
> _col5, _col6, _col7, _col9
>                   Statistics: Num rows: 1875154688 Data size: 373155782912 
> Basic stats: COMPLETE Column stats: COMPLETE
>                   Reduce Output Operator
>                     key expressions: _col0 (type: int), _col5 (type: int)
>                     sort order: ++
>                     Map-reduce partition columns: _col0 (type: int), _col5 
> (type: int)
>                     Statistics: Num rows: 1875154688 Data size: 373155782912 
> Basic stats: COMPLETE Column stats: COMPLETE
>                     value expressions: _col2 (type: int), _col3 (type: int), 
> _col4 (type: int), _col6 (type: float), _col7 (type: float), _col9 (type: 
> string), _col10 (type: string)
>         Reducer 4 
>             Reduce Operator Tree:
>               Merge Join Operator
>                 condition map:
>                      Inner Join 0 to 1
>                 condition expressions:
>                   0 {VALUE._col1} {VALUE._col2} {VALUE._col3} {VALUE._col4}
>                   1 {VALUE._col1} {VALUE._col2} {VALUE._col3} {VALUE._col4} 
> {VALUE._col5} {VALUE._col7} {VALUE._col8}
>                 outputColumnNames: _col3, _col4, _col5, _col6, _col10, 
> _col11, _col12, _col14, _col15, _col17, _col18
>                 Statistics: Num rows: 57653145 Data size: 11472975855 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                 Filter Operator
>                   predicate: (((_col17 = 'M') and ((_col18 = '4 yr Degree') 
> and _col4 BETWEEN 100.0 AND 150.0)) or (((_col17 = 'D') and ((_col18 = 
> 'Primary') and _col4 BETWEEN 50.0 AND 100.0)) or ((_col17 = 'U') and ((_col18 
> = 'Advanced Degree') and _col4 BETWEEN 150.0 AND 200.0)))) (type: boolean)
>                   Statistics: Num rows: 57653145 Data size: 11472975855 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                   Select Operator
>                     expressions: _col11 (type: int), _col12 (type: int), 
> _col14 (type: float), _col15 (type: float), _col17 (type: string), _col18 
> (type: string), _col3 (type: int), _col5 (type: float), _col6 (type: int), 
> _col10 (type: int)
>                     outputColumnNames: _col10, _col11, _col13, _col14, 
> _col17, _col18, _col3, _col5, _col6, _col9
>                     Statistics: Num rows: 57653145 Data size: 11472975855 
> Basic stats: COMPLETE Column stats: COMPLETE
>                     Reduce Output Operator
>                       key expressions: _col10 (type: int), _col17 (type: 
> string), _col18 (type: string)
>                       sort order: +++
>                       Map-reduce partition columns: _col10 (type: int), 
> _col17 (type: string), _col18 (type: string)
>                       Statistics: Num rows: 57653145 Data size: 11472975855 
> Basic stats: COMPLETE Column stats: COMPLETE
>                       value expressions: _col3 (type: int), _col5 (type: 
> float), _col6 (type: int), _col9 (type: int), _col11 (type: int), _col13 
> (type: float), _col14 (type: float)
>         Reducer 5 
>             Reduce Operator Tree:
>               Merge Join Operator
>                 condition map:
>                      Inner Join 0 to 1
>                 condition expressions:
>                   0 
>                   1 {VALUE._col3} {VALUE._col5} {VALUE._col6} {VALUE._col9} 
> {VALUE._col10} {VALUE._col12} {VALUE._col13}
>                 outputColumnNames: _col6, _col8, _col9, _col12, _col14, 
> _col16, _col17
>                 Statistics: Num rows: 3187317548 Data size: 50997080768 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                 Select Operator
>                   expressions: _col12 (type: int), _col14 (type: int), _col16 
> (type: float), _col17 (type: float), _col6 (type: int), _col8 (type: float), 
> _col9 (type: int)
>                   outputColumnNames: _col12, _col14, _col16, _col17, _col6, 
> _col8, _col9
>                   Statistics: Num rows: 3187317548 Data size: 50997080768 
> Basic stats: COMPLETE Column stats: COMPLETE
>                   Reduce Output Operator
>                     key expressions: _col12 (type: int)
>                     sort order: +
>                     Map-reduce partition columns: _col12 (type: int)
>                     Statistics: Num rows: 3187317548 Data size: 50997080768 
> Basic stats: COMPLETE Column stats: COMPLETE
>                     value expressions: _col6 (type: int), _col8 (type: 
> float), _col9 (type: int), _col14 (type: int), _col16 (type: float), _col17 
> (type: float)
>         Reducer 6 
>             Reduce Operator Tree:
>               Merge Join Operator
>                 condition map:
>                      Inner Join 0 to 1
>                 condition expressions:
>                   0 {VALUE._col0}
>                   1 {VALUE._col6} {VALUE._col8} {VALUE._col9} {VALUE._col13} 
> {VALUE._col15} {VALUE._col16}
>                 outputColumnNames: _col1, _col9, _col11, _col12, _col17, 
> _col19, _col20
>                 Statistics: Num rows: 1593658752 Data size: 156178557696 
> Basic stats: COMPLETE Column stats: COMPLETE
>                 Filter Operator
>                   predicate: (((_col1) IN ('KY', 'GA', 'NM') and _col11 
> BETWEEN 100 AND 200) or (((_col1) IN ('MT', 'OR', 'IN') and _col11 BETWEEN 
> 150 AND 300) or ((_col1) IN ('WI', 'MO', 'WV') and _col11 BETWEEN 50 AND 
> 250))) (type: boolean)
>                   Statistics: Num rows: 1195244064 Data size: 117133918272 
> Basic stats: COMPLETE Column stats: COMPLETE
>                   Select Operator
>                     expressions: _col17 (type: int), _col19 (type: float), 
> _col20 (type: float), _col9 (type: int), _col12 (type: int)
>                     outputColumnNames: _col11, _col13, _col14, _col3, _col6
>                     Statistics: Num rows: 1195244064 Data size: 14342928768 
> Basic stats: COMPLETE Column stats: COMPLETE
>                     Map Join Operator
>                       condition map:
>                            Inner Join 0 to 1
>                       condition expressions:
>                         0 
>                         1 {_col3} {_col11} {_col13} {_col14}
>                       keys:
>                         0 _col0 (type: int)
>                         1 _col6 (type: int)
>                       outputColumnNames: _col5, _col13, _col15, _col16
>                       input vertices:
>                         0 Map 11
>                       Statistics: Num rows: 1334416318 Data size: 16012995816 
> Basic stats: COMPLETE Column stats: COMPLETE
>                       Select Operator
>                         expressions: _col13 (type: int), _col15 (type: 
> float), _col16 (type: float), _col5 (type: int)
>                         outputColumnNames: _col13, _col15, _col16, _col5
>                         Statistics: Num rows: 1334416318 Data size: 
> 16012995816 Basic stats: COMPLETE Column stats: COMPLETE
>                         Map Join Operator
>                           condition map:
>                                Inner Join 0 to 1
>                           condition expressions:
>                             0 {_col1}
>                             1 {_col5} {_col15} {_col16}
>                           keys:
>                             0 _col0 (type: int)
>                             1 _col13 (type: int)
>                           outputColumnNames: _col1, _col7, _col17, _col18
>                           input vertices:
>                             0 Map 12
>                           Statistics: Num rows: 1334416256 Data size: 
> 140113706880 Basic stats: COMPLETE Column stats: COMPLETE
>                           Select Operator
>                             expressions: _col1 (type: string), _col7 (type: 
> int), _col18 (type: float), _col17 (type: float)
>                             outputColumnNames: _col0, _col1, _col2, _col3
>                             Statistics: Num rows: 1334416256 Data size: 
> 140113706880 Basic stats: COMPLETE Column stats: COMPLETE
>                             Group By Operator
>                               aggregations: avg(_col1), avg(_col2), avg(_col3)
>                               keys: _col0 (type: string)
>                               mode: hash
>                               outputColumnNames: _col0, _col1, _col2, _col3
>                               Statistics: Num rows: 157024 Data size: 
> 15231328 Basic stats: COMPLETE Column stats: COMPLETE
>                               Reduce Output Operator
>                                 key expressions: _col0 (type: string)
>                                 sort order: +
>                                 Map-reduce partition columns: _col0 (type: 
> string)
>                                 Statistics: Num rows: 157024 Data size: 
> 15231328 Basic stats: COMPLETE Column stats: COMPLETE
>                                 value expressions: _col1 (type: 
> struct<count:bigint,sum:double,input:int>), _col2 (type: 
> struct<count:bigint,sum:double,input:float>), _col3 (type: 
> struct<count:bigint,sum:double,input:float>)
>         Reducer 7 
>             Reduce Operator Tree:
>               Group By Operator
>                 aggregations: avg(VALUE._col0), avg(VALUE._col1), 
> avg(VALUE._col2)
>                 keys: KEY._col0 (type: string)
>                 mode: mergepartial
>                 outputColumnNames: _col0, _col1, _col2, _col3
>                 Statistics: Num rows: 112 Data size: 13552 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                 Select Operator
>                   expressions: substr(_col0, 1, 20) (type: string), _col1 
> (type: double), _col2 (type: double), _col3 (type: double)
>                   outputColumnNames: _col0, _col1, _col2, _col3
>                   Statistics: Num rows: 112 Data size: 23296 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                   Reduce Output Operator
>                     key expressions: _col0 (type: string), _col1 (type: 
> double), _col2 (type: double), _col3 (type: double)
>                     sort order: ++++
>                     Statistics: Num rows: 112 Data size: 23296 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                     TopN Hash Memory Usage: 0.04
>         Reducer 8 
>             Reduce Operator Tree:
>               Select Operator
>                 expressions: KEY.reducesinkkey0 (type: string), 
> KEY.reducesinkkey1 (type: double), KEY.reducesinkkey2 (type: double), 
> KEY.reducesinkkey3 (type: double)
>                 outputColumnNames: _col0, _col1, _col2, _col3
>                 Statistics: Num rows: 112 Data size: 23296 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                 Limit
>                   Number of rows: 100
>                   Statistics: Num rows: 100 Data size: 20800 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                   File Output Operator
>                     compressed: false
>                     Statistics: Num rows: 100 Data size: 20800 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                     table:
>                         input format: org.apache.hadoop.mapred.TextInputFormat
>                         output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                         serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>             Execution mode: vectorized
>   Stage: Stage-0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9069) Simplify filter predicates for CBO

Reply via email to