[ https://issues.apache.org/jira/browse/HIVE-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13838991#comment-13838991 ]
Yin Huai commented on HIVE-5945: -------------------------------- aliasToFileSizeMap should have aliases used in the next stage instead of all tables. > ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask sums all tables' > sizes including those tables which are not used in the child of this > conditional task. > ---------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-5945 > URL: https://issues.apache.org/jira/browse/HIVE-5945 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.13.0 > Reporter: Yin Huai > > Here is an example > {code} > select > i_item_id, > s_state, > avg(ss_quantity) agg1, > avg(ss_list_price) agg2, > avg(ss_coupon_amt) agg3, > avg(ss_sales_price) agg4 > FROM store_sales > JOIN date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk) > JOIN item on (store_sales.ss_item_sk = item.i_item_sk) > JOIN customer_demographics on (store_sales.ss_cdemo_sk = > customer_demographics.cd_demo_sk) > JOIN store on (store_sales.ss_store_sk = store.s_store_sk) > where > cd_gender = 'F' and > cd_marital_status = 'U' and > cd_education_status = 'Primary' and > d_year = 2002 and > s_state in ('GA','PA', 'LA', 'SC', 'MI', 'AL') > group by > i_item_id, > s_state > order by > i_item_id, > s_state > limit 100; > {\code} > I turned off noconditionaltask. So, I expected that there will be 4 Map-only > jobs for this query. However, I got 1 Map-only job (joining strore_sales and > date_dim) and 3 MR job (for reduce joins.) > So, I checked the conditional task determining the plan of the join involving > item. In Hive > HiveHIVE-5945 > ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask, aliasToFileSizeMap > contains all input tables used in this query and the intermediate table > generated by joining store_sales and date_dim. So, when we sum the size of > all small tables, the size of store_sales (which is around 45GB in my test) > will be also counted. -- This message was sent by Atlassian JIRA (v6.1#6144)