[ https://issues.apache.org/jira/browse/HIVE-24357?focusedWorklogId=509843&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509843 ]
ASF GitHub Bot logged work on HIVE-24357: ----------------------------------------- Author: ASF GitHub Bot Created on: 10/Nov/20 17:34 Start Date: 10/Nov/20 17:34 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on a change in pull request #1653: URL: https://github.com/apache/hive/pull/1653#discussion_r517869934 ########## File path: ql/src/test/results/clientpositive/perf/tez/query61.q.out ########## @@ -165,70 +167,81 @@ Stage-0 SHUFFLE [RS_38] PartitionCols:_col2 Merge Join Operator [MERGEJOIN_256] (rows=2526982 width=0) - Conds:RS_30._col4=RS_290._col0(Inner),Output:["_col2","_col5"] - <-Map 20 [SIMPLE_EDGE] vectorized - SHUFFLE [RS_290] + Conds:RS_30._col4=RS_295._col0(Inner),Output:["_col2","_col5"] + <-Map 21 [SIMPLE_EDGE] vectorized + SHUFFLE [RS_295] PartitionCols:_col0 - Select Operator [SEL_289] (rows=2300 width=4) + Select Operator [SEL_294] (rows=2300 width=4) Output:["_col0"] - Filter Operator [FIL_288] (rows=2300 width=259) + Filter Operator [FIL_293] (rows=2300 width=259) predicate:(((p_channel_dmail = 'Y') or (p_channel_email = 'Y') or (p_channel_tv = 'Y')) and p_promo_sk is not null) TableScan [TS_18] (rows=2300 width=259) default@promotion,promotion,Tbl:COMPLETE,Col:COMPLETE,Output:["p_promo_sk","p_channel_dmail","p_channel_email","p_channel_tv"] <-Reducer 12 [SIMPLE_EDGE] SHUFFLE [RS_30] PartitionCols:_col4 Merge Join Operator [MERGEJOIN_255] (rows=2526982 width=0) - Conds:RS_27._col3=RS_286._col0(Inner),Output:["_col2","_col4","_col5"] - <-Map 19 [SIMPLE_EDGE] vectorized - SHUFFLE [RS_286] + Conds:RS_27._col3=RS_291._col0(Inner),Output:["_col2","_col4","_col5"] + <-Map 20 [SIMPLE_EDGE] vectorized + SHUFFLE [RS_291] PartitionCols:_col0 - Select Operator [SEL_285] (rows=341 width=4) + Select Operator [SEL_290] (rows=341 width=4) Output:["_col0"] - Filter Operator [FIL_284] (rows=341 width=115) + Filter Operator [FIL_289] (rows=341 width=115) predicate:((s_gmt_offset = -7) and s_store_sk is not null) TableScan [TS_15] (rows=1704 width=115) default@store,store,Tbl:COMPLETE,Col:COMPLETE,Output:["s_store_sk","s_gmt_offset"] <-Reducer 11 [SIMPLE_EDGE] SHUFFLE [RS_27] PartitionCols:_col3 Merge Join Operator [MERGEJOIN_254] (rows=12627499 width=0) - Conds:RS_24._col1=RS_282._col0(Inner),Output:["_col2","_col3","_col4","_col5"] - <-Map 18 [SIMPLE_EDGE] vectorized - SHUFFLE [RS_282] + Conds:RS_24._col1=RS_287._col0(Inner),Output:["_col2","_col3","_col4","_col5"] + <-Map 19 [SIMPLE_EDGE] vectorized + SHUFFLE [RS_287] PartitionCols:_col0 - Select Operator [SEL_281] (rows=46200 width=4) + Select Operator [SEL_286] (rows=46200 width=4) Output:["_col0"] - Filter Operator [FIL_280] (rows=46200 width=94) + Filter Operator [FIL_285] (rows=46200 width=94) predicate:((i_category = 'Electronics') and i_item_sk is not null) TableScan [TS_12] (rows=462000 width=94) default@item,item,Tbl:COMPLETE,Col:COMPLETE,Output:["i_item_sk","i_category"] <-Reducer 10 [SIMPLE_EDGE] SHUFFLE [RS_24] PartitionCols:_col1 Merge Join Operator [MERGEJOIN_253] (rows=13119234 width=4) - Conds:RS_274._col0=RS_278._col0(Inner),Output:["_col1","_col2","_col3","_col4","_col5"] + Conds:RS_283._col0=RS_272._col0(Inner),Output:["_col1","_col2","_col3","_col4","_col5"] <-Map 17 [SIMPLE_EDGE] vectorized - SHUFFLE [RS_278] + PARTITION_ONLY_SHUFFLE [RS_272] PartitionCols:_col0 - Select Operator [SEL_277] (rows=50 width=4) + Select Operator [SEL_271] (rows=50 width=4) Output:["_col0"] - Filter Operator [FIL_276] (rows=50 width=12) + Filter Operator [FIL_270] (rows=50 width=12) predicate:((d_year = 1999) and (d_moy = 11) and d_date_sk is not null) TableScan [TS_9] (rows=73049 width=12) default@date_dim,date_dim,Tbl:COMPLETE,Col:COMPLETE,Output:["d_date_sk","d_year","d_moy"] <-Map 9 [SIMPLE_EDGE] vectorized - SHUFFLE [RS_274] + SHUFFLE [RS_283] PartitionCols:_col0 - Select Operator [SEL_272] (rows=479120969 width=126) + Select Operator [SEL_281] (rows=479120969 width=126) Output:["_col0","_col1","_col2","_col3","_col4","_col5"] - Filter Operator [FIL_270] (rows=479120969 width=126) - predicate:(ss_sold_date_sk is not null and ss_promo_sk is not null and ss_customer_sk is not null and ss_store_sk is not null and ss_item_sk is not null) + Filter Operator [FIL_279] (rows=479120969 width=126) + predicate:(ss_sold_date_sk is not null and ss_promo_sk is not null and ss_customer_sk is not null and ss_store_sk is not null and ss_item_sk is not null and ss_sold_date_sk BETWEEN DynamicValue(RS_22_date_dim_d_date_sk_min) AND DynamicValue(RS_22_date_dim_d_date_sk_max) and in_bloom_filter(ss_sold_date_sk, DynamicValue(RS_22_date_dim_d_date_sk_bloom_filter)) and ((ss_sold_date_sk is not null and ss_promo_sk is not null and ss_customer_sk is not null and ss_store_sk is not null and ss_item_sk is not null) or (ss_sold_date_sk is not null and ss_customer_sk is not null and ss_store_sk is not null and ss_item_sk is not null))) TableScan [TS_6] (rows=575995635 width=126) default@store_sales,store_sales,Tbl:COMPLETE,Col:COMPLETE,Output:["ss_sold_date_sk","ss_item_sk","ss_customer_sk","ss_store_sk","ss_promo_sk","ss_ext_sales_price"] + <-Reducer 18 [BROADCAST_EDGE] vectorized Review comment: this SJ filter became absent after HIVE-23939 (ts schema merge patch) the merged TS operators were retained in the working list when the schema merge related logic was called - I have not digged deeper into this - taking a look at the input plan this SJ filter makes sense to me ########## File path: ql/src/test/results/clientpositive/perf/tez/constraints/query14.q.out ########## @@ -622,148 +619,103 @@ Stage-3 <-Reducer 16 [CUSTOM_SIMPLE_EDGE] vectorized PARTITION_ONLY_SHUFFLE [RS_1234] Please refer to the previous Select Operator [SEL_1232] - <-Reducer 43 [CUSTOM_SIMPLE_EDGE] vectorized - PARTITION_ONLY_SHUFFLE [RS_1330] - Filter Operator [FIL_1329] (rows=72 width=131) + <-Reducer 42 [CUSTOM_SIMPLE_EDGE] vectorized + PARTITION_ONLY_SHUFFLE [RS_1321] + Filter Operator [FIL_1320] (rows=72 width=131) predicate:_col3 is not null - Group By Operator [GBY_1328] (rows=72 width=131) + Group By Operator [GBY_1319] (rows=72 width=131) Output:["_col0","_col1","_col2","_col3","_col4"],aggregations:["sum(VALUE._col0)","count(VALUE._col1)"],keys:KEY._col0, KEY._col1, KEY._col2 - <-Reducer 42 [SIMPLE_EDGE] + <-Reducer 41 [SIMPLE_EDGE] SHUFFLE [RS_359] PartitionCols:_col0, _col1, _col2 Group By Operator [GBY_358] (rows=72 width=131) Output:["_col0","_col1","_col2","_col3","_col4"],aggregations:["sum(_col3)","count()"],keys:_col0, _col1, _col2 Select Operator [SEL_356] (rows=12217 width=10) Output:["_col0","_col1","_col2","_col3"] Merge Join Operator [MERGEJOIN_1150] (rows=12217 width=10) - Conds:RS_353._col1=RS_1289._col0(Inner),Output:["_col2","_col3","_col13","_col14","_col15"] - <-Map 73 [SIMPLE_EDGE] vectorized - SHUFFLE [RS_1289] + Conds:RS_353._col1=RS_1283._col0(Inner),Output:["_col2","_col3","_col13","_col14","_col15"] + <-Map 70 [SIMPLE_EDGE] vectorized + SHUFFLE [RS_1283] PartitionCols:_col0 - Select Operator [SEL_1280] (rows=462000 width=15) + Select Operator [SEL_1277] (rows=462000 width=15) Output:["_col0","_col1","_col2","_col3"] Please refer to the previous TableScan [TS_154] - <-Reducer 41 [SIMPLE_EDGE] + <-Reducer 40 [SIMPLE_EDGE] SHUFFLE [RS_353] PartitionCols:_col1 Merge Join Operator [MERGEJOIN_1149] (rows=12217 width=4) - Conds:RS_350._col6, _col7, _col8=RS_1317._col0, _col1, _col2(Inner),Output:["_col1","_col2","_col3"] - <-Reducer 40 [SIMPLE_EDGE] vectorized - SHUFFLE [RS_1317] + Conds:RS_350._col6, _col7, _col8=RS_1308._col0, _col1, _col2(Inner),Output:["_col1","_col2","_col3"] + <-Reducer 39 [SIMPLE_EDGE] vectorized + SHUFFLE [RS_1308] PartitionCols:_col0, _col1, _col2 - Select Operator [SEL_1316] (rows=1 width=12) + Select Operator [SEL_1307] (rows=1 width=12) Output:["_col0","_col1","_col2"] - Filter Operator [FIL_1315] (rows=1 width=20) + Filter Operator [FIL_1306] (rows=1 width=20) predicate:(_col3 = 3L) - Group By Operator [GBY_1314] (rows=121728 width=19) + Group By Operator [GBY_1305] (rows=121728 width=19) Output:["_col0","_col1","_col2","_col3"],aggregations:["count(VALUE._col0)"],keys:KEY._col0, KEY._col1, KEY._col2 - <-Union 39 [SIMPLE_EDGE] - <-Reducer 38 [CONTAINS] vectorized - Reduce Output Operator [RS_1353] + <-Union 38 [SIMPLE_EDGE] + <-Reducer 37 [CONTAINS] vectorized + Reduce Output Operator [RS_1344] PartitionCols:_col0, _col1, _col2 - Group By Operator [GBY_1352] (rows=121728 width=19) + Group By Operator [GBY_1343] (rows=121728 width=19) Output:["_col0","_col1","_col2","_col3"],aggregations:["count(_col3)"],keys:_col0, _col1, _col2 - Group By Operator [GBY_1351] (rows=121728 width=19) + Group By Operator [GBY_1342] (rows=121728 width=19) Output:["_col0","_col1","_col2","_col3"],aggregations:["count(VALUE._col0)"],keys:KEY._col0, KEY._col1, KEY._col2 - <-Reducer 37 [SIMPLE_EDGE] + <-Reducer 33 [SIMPLE_EDGE] SHUFFLE [RS_291] PartitionCols:_col0, _col1, _col2 - Group By Operator [GBY_290] (rows=121728 width=19) - Output:["_col0","_col1","_col2","_col3"],aggregations:["count()"],keys:_col4, _col5, _col6 - Merge Join Operator [MERGEJOIN_1144] (rows=14628613 width=11) - Conds:RS_286._col1=RS_1297._col0(Inner),Output:["_col4","_col5","_col6"] - <-Map 73 [SIMPLE_EDGE] vectorized - SHUFFLE [RS_1297] - PartitionCols:_col0 - Select Operator [SEL_1286] (rows=458612 width=15) - Output:["_col0","_col1","_col2","_col3"] - Filter Operator [FIL_1277] (rows=458612 width=15) - predicate:(i_category_id is not null and i_brand_id is not null and i_class_id is not null) - Please refer to the previous TableScan [TS_154] - <-Reducer 32 [SIMPLE_EDGE] - SHUFFLE [RS_286] - PartitionCols:_col1 - Please refer to the previous Merge Join Operator [MERGEJOIN_1132] - <-Reducer 63 [CONTAINS] vectorized - Reduce Output Operator [RS_1362] + Please refer to the previous Group By Operator [GBY_173] Review comment: I see 2 new subtree reusals in this plan - which have resulted in the removal of 4 vertices ########## File path: ql/src/test/results/clientpositive/perf/tez/constraints/query23.q.out ########## @@ -255,7 +255,7 @@ Stage-0 Select Operator [SEL_421] (rows=525327388 width=119) Output:["_col0","_col1","_col2"] Filter Operator [FIL_417] (rows=525327388 width=118) - predicate:(ss_sold_date_sk is not null and ss_customer_sk is not null and ss_sold_date_sk BETWEEN DynamicValue(RS_26_date_dim_d_date_sk_min) AND DynamicValue(RS_26_date_dim_d_date_sk_max) and in_bloom_filter(ss_sold_date_sk, DynamicValue(RS_26_date_dim_d_date_sk_bloom_filter)) and (ss_customer_sk is not null or (ss_sold_date_sk is not null and ss_customer_sk is not null)) and ((ss_customer_sk BETWEEN DynamicValue(RS_47_catalog_sales_cs_bill_customer_sk_min) AND DynamicValue(RS_47_catalog_sales_cs_bill_customer_sk_max) and in_bloom_filter(ss_customer_sk, DynamicValue(RS_47_catalog_sales_cs_bill_customer_sk_bloom_filter))) or (ss_sold_date_sk BETWEEN DynamicValue(RS_26_date_dim_d_date_sk_min) AND DynamicValue(RS_26_date_dim_d_date_sk_max) and in_bloom_filter(ss_sold_date_sk, DynamicValue(RS_26_date_dim_d_date_sk_bloom_filter)))) and (((ss_customer_sk is not null or (ss_sold_date_sk is not null and ss_customer_sk is not null)) and ((ss_customer_sk BETWEEN DynamicValue(RS_47_catalog_sales_cs_bill_customer_sk_min) AND DynamicValue(RS_47_catalog_sales_cs_bill_customer_sk_max) and in_bloom_filter(ss_customer_sk, DynamicValue(RS_47_catalog_sales_cs_bill_customer_sk_bloom_filter))) or (ss_sold_date_sk BETWEEN DynamicValue(RS_26_date_dim_d_date_sk_min) AND DynamicValue(RS_26_date_dim_d_date_sk_max) and in_bloom_filter(ss_sold_date_sk, DynamicValue(RS_26_date_dim_d_date_sk_bloom_filter))))) or ss_sold_date_sk is not null) and (ss_sold_date_sk BETWEEN DynamicValue(RS_58_date_dim_d_date_sk_min) AND DynamicValue(RS_58_date_dim_d_date_sk_max) and in_bloom_filter(ss_sold_date_sk, DynamicValue(RS_58_date_dim_d_date_sk_bloom_filter)))) + predicate:(ss_sold_date_sk is not null and ss_customer_sk is not null and ss_sold_date_sk BETWEEN DynamicValue(RS_26_date_dim_d_date_sk_min) AND DynamicValue(RS_26_date_dim_d_date_sk_max) and in_bloom_filter(ss_sold_date_sk, DynamicValue(RS_26_date_dim_d_date_sk_bloom_filter)) and (ss_customer_sk is not null or (ss_sold_date_sk is not null and ss_customer_sk is not null)) and ((ss_customer_sk BETWEEN DynamicValue(RS_47_catalog_sales_cs_bill_customer_sk_min) AND DynamicValue(RS_47_catalog_sales_cs_bill_customer_sk_max) and in_bloom_filter(ss_customer_sk, DynamicValue(RS_47_catalog_sales_cs_bill_customer_sk_bloom_filter))) or (ss_sold_date_sk BETWEEN DynamicValue(RS_26_date_dim_d_date_sk_min) AND DynamicValue(RS_26_date_dim_d_date_sk_max) and in_bloom_filter(ss_sold_date_sk, DynamicValue(RS_26_date_dim_d_date_sk_bloom_filter)))) and (((ss_customer_sk is not null or (ss_sold_date_sk is not null and ss_customer_sk is not null)) and ((ss_customer_sk BETWEEN DynamicValue(RS_47_catalog_sales_cs_bill_customer_sk_min) AND DynamicValue(RS_47_catalog_sales_cs_bill_customer_sk_max) and in_bloom_filter(ss_customer_sk, DynamicValue(RS_47_catalog_sales_cs_bill_customer_sk_bloom_filter))) or (ss_sold_date_sk BETWEEN DynamicValue(RS_26_date_dim_d_date_sk_min) AND DynamicValue(RS_26_date_dim_d_date_sk_max) and in_bloom_filter(ss_sold_date_sk, DynamicValue(RS_26_date_dim_d_date_sk_bloom_filter))))) or ss_sold_date_sk is not null) and ss_sold_date_sk BETWEEN DynamicValue(RS_58_date_dim_d_date_sk_min) AND DynamicValue(RS_58_date_dim_d_date_sk_max) and in_bloom_filter(ss_sold_date_sk, DynamicValue(RS_58_date_dim_d_date_sk_bloom_filter))) Review comment: HIVE-24365 will address these conditions ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 509843) Time Spent: 20m (was: 10m) > Exchange SWO table/algorithm strategy > ------------------------------------- > > Key: HIVE-24357 > URL: https://issues.apache.org/jira/browse/HIVE-24357 > Project: Hive > Issue Type: Improvement > Reporter: Zoltan Haindrich > Assignee: Zoltan Haindrich > Priority: Major > Labels: pull-request-available > Attachments: swo.before.jointree.dot.png > > Time Spent: 20m > Remaining Estimate: 0h > > SWO right now runs like: > {code} > for every strategy s: for every table t: try s for t > {code} > this results in that an earlier startegy may create a more entangled operator > tree behind - in case its able to merge for a less prioritized table > it would probably make more sense to do: > {code} > for every table t: for every strategy s: try s for t > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)