[ 
https://issues.apache.org/jira/browse/HIVE-24357?focusedWorklogId=509843&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509843
 ]

ASF GitHub Bot logged work on HIVE-24357:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 10/Nov/20 17:34
            Start Date: 10/Nov/20 17:34
    Worklog Time Spent: 10m 
      Work Description: kgyrtkirk commented on a change in pull request #1653:
URL: https://github.com/apache/hive/pull/1653#discussion_r517869934



##########
File path: ql/src/test/results/clientpositive/perf/tez/query61.q.out
##########
@@ -165,70 +167,81 @@ Stage-0
                       SHUFFLE [RS_38]
                         PartitionCols:_col2
                         Merge Join Operator [MERGEJOIN_256] (rows=2526982 
width=0)
-                          
Conds:RS_30._col4=RS_290._col0(Inner),Output:["_col2","_col5"]
-                        <-Map 20 [SIMPLE_EDGE] vectorized
-                          SHUFFLE [RS_290]
+                          
Conds:RS_30._col4=RS_295._col0(Inner),Output:["_col2","_col5"]
+                        <-Map 21 [SIMPLE_EDGE] vectorized
+                          SHUFFLE [RS_295]
                             PartitionCols:_col0
-                            Select Operator [SEL_289] (rows=2300 width=4)
+                            Select Operator [SEL_294] (rows=2300 width=4)
                               Output:["_col0"]
-                              Filter Operator [FIL_288] (rows=2300 width=259)
+                              Filter Operator [FIL_293] (rows=2300 width=259)
                                 predicate:(((p_channel_dmail = 'Y') or 
(p_channel_email = 'Y') or (p_channel_tv = 'Y')) and p_promo_sk is not null)
                                 TableScan [TS_18] (rows=2300 width=259)
                                   
default@promotion,promotion,Tbl:COMPLETE,Col:COMPLETE,Output:["p_promo_sk","p_channel_dmail","p_channel_email","p_channel_tv"]
                         <-Reducer 12 [SIMPLE_EDGE]
                           SHUFFLE [RS_30]
                             PartitionCols:_col4
                             Merge Join Operator [MERGEJOIN_255] (rows=2526982 
width=0)
-                              
Conds:RS_27._col3=RS_286._col0(Inner),Output:["_col2","_col4","_col5"]
-                            <-Map 19 [SIMPLE_EDGE] vectorized
-                              SHUFFLE [RS_286]
+                              
Conds:RS_27._col3=RS_291._col0(Inner),Output:["_col2","_col4","_col5"]
+                            <-Map 20 [SIMPLE_EDGE] vectorized
+                              SHUFFLE [RS_291]
                                 PartitionCols:_col0
-                                Select Operator [SEL_285] (rows=341 width=4)
+                                Select Operator [SEL_290] (rows=341 width=4)
                                   Output:["_col0"]
-                                  Filter Operator [FIL_284] (rows=341 
width=115)
+                                  Filter Operator [FIL_289] (rows=341 
width=115)
                                     predicate:((s_gmt_offset = -7) and 
s_store_sk is not null)
                                     TableScan [TS_15] (rows=1704 width=115)
                                       
default@store,store,Tbl:COMPLETE,Col:COMPLETE,Output:["s_store_sk","s_gmt_offset"]
                             <-Reducer 11 [SIMPLE_EDGE]
                               SHUFFLE [RS_27]
                                 PartitionCols:_col3
                                 Merge Join Operator [MERGEJOIN_254] 
(rows=12627499 width=0)
-                                  
Conds:RS_24._col1=RS_282._col0(Inner),Output:["_col2","_col3","_col4","_col5"]
-                                <-Map 18 [SIMPLE_EDGE] vectorized
-                                  SHUFFLE [RS_282]
+                                  
Conds:RS_24._col1=RS_287._col0(Inner),Output:["_col2","_col3","_col4","_col5"]
+                                <-Map 19 [SIMPLE_EDGE] vectorized
+                                  SHUFFLE [RS_287]
                                     PartitionCols:_col0
-                                    Select Operator [SEL_281] (rows=46200 
width=4)
+                                    Select Operator [SEL_286] (rows=46200 
width=4)
                                       Output:["_col0"]
-                                      Filter Operator [FIL_280] (rows=46200 
width=94)
+                                      Filter Operator [FIL_285] (rows=46200 
width=94)
                                         predicate:((i_category = 
'Electronics') and i_item_sk is not null)
                                         TableScan [TS_12] (rows=462000 
width=94)
                                           
default@item,item,Tbl:COMPLETE,Col:COMPLETE,Output:["i_item_sk","i_category"]
                                 <-Reducer 10 [SIMPLE_EDGE]
                                   SHUFFLE [RS_24]
                                     PartitionCols:_col1
                                     Merge Join Operator [MERGEJOIN_253] 
(rows=13119234 width=4)
-                                      
Conds:RS_274._col0=RS_278._col0(Inner),Output:["_col1","_col2","_col3","_col4","_col5"]
+                                      
Conds:RS_283._col0=RS_272._col0(Inner),Output:["_col1","_col2","_col3","_col4","_col5"]
                                     <-Map 17 [SIMPLE_EDGE] vectorized
-                                      SHUFFLE [RS_278]
+                                      PARTITION_ONLY_SHUFFLE [RS_272]
                                         PartitionCols:_col0
-                                        Select Operator [SEL_277] (rows=50 
width=4)
+                                        Select Operator [SEL_271] (rows=50 
width=4)
                                           Output:["_col0"]
-                                          Filter Operator [FIL_276] (rows=50 
width=12)
+                                          Filter Operator [FIL_270] (rows=50 
width=12)
                                             predicate:((d_year = 1999) and 
(d_moy = 11) and d_date_sk is not null)
                                             TableScan [TS_9] (rows=73049 
width=12)
                                               
default@date_dim,date_dim,Tbl:COMPLETE,Col:COMPLETE,Output:["d_date_sk","d_year","d_moy"]
                                     <-Map 9 [SIMPLE_EDGE] vectorized
-                                      SHUFFLE [RS_274]
+                                      SHUFFLE [RS_283]
                                         PartitionCols:_col0
-                                        Select Operator [SEL_272] 
(rows=479120969 width=126)
+                                        Select Operator [SEL_281] 
(rows=479120969 width=126)
                                           
Output:["_col0","_col1","_col2","_col3","_col4","_col5"]
-                                          Filter Operator [FIL_270] 
(rows=479120969 width=126)
-                                            predicate:(ss_sold_date_sk is not 
null and ss_promo_sk is not null and ss_customer_sk is not null and ss_store_sk 
is not null and ss_item_sk is not null)
+                                          Filter Operator [FIL_279] 
(rows=479120969 width=126)
+                                            predicate:(ss_sold_date_sk is not 
null and ss_promo_sk is not null and ss_customer_sk is not null and ss_store_sk 
is not null and ss_item_sk is not null and ss_sold_date_sk BETWEEN 
DynamicValue(RS_22_date_dim_d_date_sk_min) AND 
DynamicValue(RS_22_date_dim_d_date_sk_max) and in_bloom_filter(ss_sold_date_sk, 
DynamicValue(RS_22_date_dim_d_date_sk_bloom_filter)) and ((ss_sold_date_sk is 
not null and ss_promo_sk is not null and ss_customer_sk is not null and 
ss_store_sk is not null and ss_item_sk is not null) or (ss_sold_date_sk is not 
null and ss_customer_sk is not null and ss_store_sk is not null and ss_item_sk 
is not null)))
                                             TableScan [TS_6] (rows=575995635 
width=126)
                                               
default@store_sales,store_sales,Tbl:COMPLETE,Col:COMPLETE,Output:["ss_sold_date_sk","ss_item_sk","ss_customer_sk","ss_store_sk","ss_promo_sk","ss_ext_sales_price"]
+                                            <-Reducer 18 [BROADCAST_EDGE] 
vectorized

Review comment:
       this SJ filter became absent after HIVE-23939 (ts schema merge patch)
   
   the merged TS operators were retained in the working list when the schema 
merge related logic was called - I have not digged deeper into this - taking a 
look at the input plan this SJ filter makes sense to me
   

##########
File path: ql/src/test/results/clientpositive/perf/tez/constraints/query14.q.out
##########
@@ -622,148 +619,103 @@ Stage-3
                                   <-Reducer 16 [CUSTOM_SIMPLE_EDGE] vectorized
                                     PARTITION_ONLY_SHUFFLE [RS_1234]
                                        Please refer to the previous Select 
Operator [SEL_1232]
-                              <-Reducer 43 [CUSTOM_SIMPLE_EDGE] vectorized
-                                PARTITION_ONLY_SHUFFLE [RS_1330]
-                                  Filter Operator [FIL_1329] (rows=72 
width=131)
+                              <-Reducer 42 [CUSTOM_SIMPLE_EDGE] vectorized
+                                PARTITION_ONLY_SHUFFLE [RS_1321]
+                                  Filter Operator [FIL_1320] (rows=72 
width=131)
                                     predicate:_col3 is not null
-                                    Group By Operator [GBY_1328] (rows=72 
width=131)
+                                    Group By Operator [GBY_1319] (rows=72 
width=131)
                                       
Output:["_col0","_col1","_col2","_col3","_col4"],aggregations:["sum(VALUE._col0)","count(VALUE._col1)"],keys:KEY._col0,
 KEY._col1, KEY._col2
-                                    <-Reducer 42 [SIMPLE_EDGE]
+                                    <-Reducer 41 [SIMPLE_EDGE]
                                       SHUFFLE [RS_359]
                                         PartitionCols:_col0, _col1, _col2
                                         Group By Operator [GBY_358] (rows=72 
width=131)
                                           
Output:["_col0","_col1","_col2","_col3","_col4"],aggregations:["sum(_col3)","count()"],keys:_col0,
 _col1, _col2
                                           Select Operator [SEL_356] 
(rows=12217 width=10)
                                             
Output:["_col0","_col1","_col2","_col3"]
                                             Merge Join Operator 
[MERGEJOIN_1150] (rows=12217 width=10)
-                                              
Conds:RS_353._col1=RS_1289._col0(Inner),Output:["_col2","_col3","_col13","_col14","_col15"]
-                                            <-Map 73 [SIMPLE_EDGE] vectorized
-                                              SHUFFLE [RS_1289]
+                                              
Conds:RS_353._col1=RS_1283._col0(Inner),Output:["_col2","_col3","_col13","_col14","_col15"]
+                                            <-Map 70 [SIMPLE_EDGE] vectorized
+                                              SHUFFLE [RS_1283]
                                                 PartitionCols:_col0
-                                                Select Operator [SEL_1280] 
(rows=462000 width=15)
+                                                Select Operator [SEL_1277] 
(rows=462000 width=15)
                                                   
Output:["_col0","_col1","_col2","_col3"]
                                                    Please refer to the 
previous TableScan [TS_154]
-                                            <-Reducer 41 [SIMPLE_EDGE]
+                                            <-Reducer 40 [SIMPLE_EDGE]
                                               SHUFFLE [RS_353]
                                                 PartitionCols:_col1
                                                 Merge Join Operator 
[MERGEJOIN_1149] (rows=12217 width=4)
-                                                  Conds:RS_350._col6, _col7, 
_col8=RS_1317._col0, _col1, _col2(Inner),Output:["_col1","_col2","_col3"]
-                                                <-Reducer 40 [SIMPLE_EDGE] 
vectorized
-                                                  SHUFFLE [RS_1317]
+                                                  Conds:RS_350._col6, _col7, 
_col8=RS_1308._col0, _col1, _col2(Inner),Output:["_col1","_col2","_col3"]
+                                                <-Reducer 39 [SIMPLE_EDGE] 
vectorized
+                                                  SHUFFLE [RS_1308]
                                                     PartitionCols:_col0, 
_col1, _col2
-                                                    Select Operator [SEL_1316] 
(rows=1 width=12)
+                                                    Select Operator [SEL_1307] 
(rows=1 width=12)
                                                       
Output:["_col0","_col1","_col2"]
-                                                      Filter Operator 
[FIL_1315] (rows=1 width=20)
+                                                      Filter Operator 
[FIL_1306] (rows=1 width=20)
                                                         predicate:(_col3 = 3L)
-                                                        Group By Operator 
[GBY_1314] (rows=121728 width=19)
+                                                        Group By Operator 
[GBY_1305] (rows=121728 width=19)
                                                           
Output:["_col0","_col1","_col2","_col3"],aggregations:["count(VALUE._col0)"],keys:KEY._col0,
 KEY._col1, KEY._col2
-                                                        <-Union 39 
[SIMPLE_EDGE]
-                                                          <-Reducer 38 
[CONTAINS] vectorized
-                                                            Reduce Output 
Operator [RS_1353]
+                                                        <-Union 38 
[SIMPLE_EDGE]
+                                                          <-Reducer 37 
[CONTAINS] vectorized
+                                                            Reduce Output 
Operator [RS_1344]
                                                               
PartitionCols:_col0, _col1, _col2
-                                                              Group By 
Operator [GBY_1352] (rows=121728 width=19)
+                                                              Group By 
Operator [GBY_1343] (rows=121728 width=19)
                                                                 
Output:["_col0","_col1","_col2","_col3"],aggregations:["count(_col3)"],keys:_col0,
 _col1, _col2
-                                                                Group By 
Operator [GBY_1351] (rows=121728 width=19)
+                                                                Group By 
Operator [GBY_1342] (rows=121728 width=19)
                                                                   
Output:["_col0","_col1","_col2","_col3"],aggregations:["count(VALUE._col0)"],keys:KEY._col0,
 KEY._col1, KEY._col2
-                                                                <-Reducer 37 
[SIMPLE_EDGE]
+                                                                <-Reducer 33 
[SIMPLE_EDGE]
                                                                   SHUFFLE 
[RS_291]
                                                                     
PartitionCols:_col0, _col1, _col2
-                                                                    Group By 
Operator [GBY_290] (rows=121728 width=19)
-                                                                      
Output:["_col0","_col1","_col2","_col3"],aggregations:["count()"],keys:_col4, 
_col5, _col6
-                                                                      Merge 
Join Operator [MERGEJOIN_1144] (rows=14628613 width=11)
-                                                                        
Conds:RS_286._col1=RS_1297._col0(Inner),Output:["_col4","_col5","_col6"]
-                                                                      <-Map 73 
[SIMPLE_EDGE] vectorized
-                                                                        
SHUFFLE [RS_1297]
-                                                                          
PartitionCols:_col0
-                                                                          
Select Operator [SEL_1286] (rows=458612 width=15)
-                                                                            
Output:["_col0","_col1","_col2","_col3"]
-                                                                            
Filter Operator [FIL_1277] (rows=458612 width=15)
-                                                                              
predicate:(i_category_id is not null and i_brand_id is not null and i_class_id 
is not null)
-                                                                               
Please refer to the previous TableScan [TS_154]
-                                                                      
<-Reducer 32 [SIMPLE_EDGE]
-                                                                        
SHUFFLE [RS_286]
-                                                                          
PartitionCols:_col1
-                                                                           
Please refer to the previous Merge Join Operator [MERGEJOIN_1132]
-                                                          <-Reducer 63 
[CONTAINS] vectorized
-                                                            Reduce Output 
Operator [RS_1362]
+                                                                     Please 
refer to the previous Group By Operator [GBY_173]

Review comment:
       I see 2 new subtree reusals in this plan - which have resulted in the 
removal of 4 vertices

##########
File path: ql/src/test/results/clientpositive/perf/tez/constraints/query23.q.out
##########
@@ -255,7 +255,7 @@ Stage-0
                                                             Select Operator 
[SEL_421] (rows=525327388 width=119)
                                                               
Output:["_col0","_col1","_col2"]
                                                               Filter Operator 
[FIL_417] (rows=525327388 width=118)
-                                                                
predicate:(ss_sold_date_sk is not null and ss_customer_sk is not null and 
ss_sold_date_sk BETWEEN DynamicValue(RS_26_date_dim_d_date_sk_min) AND 
DynamicValue(RS_26_date_dim_d_date_sk_max) and in_bloom_filter(ss_sold_date_sk, 
DynamicValue(RS_26_date_dim_d_date_sk_bloom_filter)) and (ss_customer_sk is not 
null or (ss_sold_date_sk is not null and ss_customer_sk is not null)) and 
((ss_customer_sk BETWEEN 
DynamicValue(RS_47_catalog_sales_cs_bill_customer_sk_min) AND 
DynamicValue(RS_47_catalog_sales_cs_bill_customer_sk_max) and 
in_bloom_filter(ss_customer_sk, 
DynamicValue(RS_47_catalog_sales_cs_bill_customer_sk_bloom_filter))) or 
(ss_sold_date_sk BETWEEN DynamicValue(RS_26_date_dim_d_date_sk_min) AND 
DynamicValue(RS_26_date_dim_d_date_sk_max) and in_bloom_filter(ss_sold_date_sk, 
DynamicValue(RS_26_date_dim_d_date_sk_bloom_filter)))) and (((ss_customer_sk is 
not null or (ss_sold_date_sk is not null and ss_customer_sk is not null)) and 
((ss_customer_sk BETWEEN 
DynamicValue(RS_47_catalog_sales_cs_bill_customer_sk_min) AND 
DynamicValue(RS_47_catalog_sales_cs_bill_customer_sk_max) and 
in_bloom_filter(ss_customer_sk, 
DynamicValue(RS_47_catalog_sales_cs_bill_customer_sk_bloom_filter))) or 
(ss_sold_date_sk BETWEEN DynamicValue(RS_26_date_dim_d_date_sk_min) AND 
DynamicValue(RS_26_date_dim_d_date_sk_max) and in_bloom_filter(ss_sold_date_sk, 
DynamicValue(RS_26_date_dim_d_date_sk_bloom_filter))))) or ss_sold_date_sk is 
not null) and (ss_sold_date_sk BETWEEN 
DynamicValue(RS_58_date_dim_d_date_sk_min) AND 
DynamicValue(RS_58_date_dim_d_date_sk_max) and in_bloom_filter(ss_sold_date_sk, 
DynamicValue(RS_58_date_dim_d_date_sk_bloom_filter))))
+                                                                
predicate:(ss_sold_date_sk is not null and ss_customer_sk is not null and 
ss_sold_date_sk BETWEEN DynamicValue(RS_26_date_dim_d_date_sk_min) AND 
DynamicValue(RS_26_date_dim_d_date_sk_max) and in_bloom_filter(ss_sold_date_sk, 
DynamicValue(RS_26_date_dim_d_date_sk_bloom_filter)) and (ss_customer_sk is not 
null or (ss_sold_date_sk is not null and ss_customer_sk is not null)) and 
((ss_customer_sk BETWEEN 
DynamicValue(RS_47_catalog_sales_cs_bill_customer_sk_min) AND 
DynamicValue(RS_47_catalog_sales_cs_bill_customer_sk_max) and 
in_bloom_filter(ss_customer_sk, 
DynamicValue(RS_47_catalog_sales_cs_bill_customer_sk_bloom_filter))) or 
(ss_sold_date_sk BETWEEN DynamicValue(RS_26_date_dim_d_date_sk_min) AND 
DynamicValue(RS_26_date_dim_d_date_sk_max) and in_bloom_filter(ss_sold_date_sk, 
DynamicValue(RS_26_date_dim_d_date_sk_bloom_filter)))) and (((ss_customer_sk is 
not null or (ss_sold_date_sk is not null and ss_customer_sk is not null)) and 
((ss_customer_sk BETWEEN 
DynamicValue(RS_47_catalog_sales_cs_bill_customer_sk_min) AND 
DynamicValue(RS_47_catalog_sales_cs_bill_customer_sk_max) and 
in_bloom_filter(ss_customer_sk, 
DynamicValue(RS_47_catalog_sales_cs_bill_customer_sk_bloom_filter))) or 
(ss_sold_date_sk BETWEEN DynamicValue(RS_26_date_dim_d_date_sk_min) AND 
DynamicValue(RS_26_date_dim_d_date_sk_max) and in_bloom_filter(ss_sold_date_sk, 
DynamicValue(RS_26_date_dim_d_date_sk_bloom_filter))))) or ss_sold_date_sk is 
not null) and ss_sold_date_sk BETWEEN 
DynamicValue(RS_58_date_dim_d_date_sk_min) AND 
DynamicValue(RS_58_date_dim_d_date_sk_max) and in_bloom_filter(ss_sold_date_sk, 
DynamicValue(RS_58_date_dim_d_date_sk_bloom_filter)))

Review comment:
       HIVE-24365 will address these conditions




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 509843)
    Time Spent: 20m  (was: 10m)

> Exchange SWO table/algorithm strategy
> -------------------------------------
>
>                 Key: HIVE-24357
>                 URL: https://issues.apache.org/jira/browse/HIVE-24357
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Zoltan Haindrich
>            Assignee: Zoltan Haindrich
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: swo.before.jointree.dot.png
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> SWO right now runs like: 
> {code}
> for every strategy s: for every table t: try s for t
> {code}
> this results in that an earlier startegy may create a more entangled operator 
> tree behind - in case its able to merge for a less prioritized table
> it would probably make more sense to do:
> {code}
> for every table t: for every strategy s: try s for t
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to