[ https://issues.apache.org/jira/browse/HIVE-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851756#comment-13851756 ]
Yin Huai commented on HIVE-5945: -------------------------------- Two minor comments in the review board. Two additional comments. When we find {code} bigTableFileAlias != null {\code} can we also log sumOfOthers and the threshold of the size of small tables? So, the log entry will show the size of the big table, the total size of other small tables, and the threshold of the size of small tables. Also, can you add a unit test? Thanks :) > ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those > tables which are not used in the child of this conditional task. > ----------------------------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-5945 > URL: https://issues.apache.org/jira/browse/HIVE-5945 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0 > Reporter: Yin Huai > Assignee: Navis > Priority: Critical > Attachments: HIVE-5945.1.patch.txt, HIVE-5945.2.patch.txt, > HIVE-5945.3.patch.txt > > > Here is an example > {code} > select > i_item_id, > s_state, > avg(ss_quantity) agg1, > avg(ss_list_price) agg2, > avg(ss_coupon_amt) agg3, > avg(ss_sales_price) agg4 > FROM store_sales > JOIN date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk) > JOIN item on (store_sales.ss_item_sk = item.i_item_sk) > JOIN customer_demographics on (store_sales.ss_cdemo_sk = > customer_demographics.cd_demo_sk) > JOIN store on (store_sales.ss_store_sk = store.s_store_sk) > where > cd_gender = 'F' and > cd_marital_status = 'U' and > cd_education_status = 'Primary' and > d_year = 2002 and > s_state in ('GA','PA', 'LA', 'SC', 'MI', 'AL') > group by > i_item_id, > s_state > order by > i_item_id, > s_state > limit 100; > {\code} > I turned off noconditionaltask. So, I expected that there will be 4 Map-only > jobs for this query. However, I got 1 Map-only job (joining strore_sales and > date_dim) and 3 MR job (for reduce joins.) > So, I checked the conditional task determining the plan of the join involving > item. In ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask, > aliasToFileSizeMap contains all input tables used in this query and the > intermediate table generated by joining store_sales and date_dim. So, when we > sum the size of all small tables, the size of store_sales (which is around > 45GB in my test) will be also counted. -- This message was sent by Atlassian JIRA (v6.1.4#6159)