[ 
https://issues.apache.org/jira/browse/HIVE-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851260#comment-13851260
 ] 

Yin Huai commented on HIVE-5945:
--------------------------------

Thanks [~navis] :) I left a few comments on the review board. I think the 
conditional task in the original trunk is not well tested. With a .q test file, 
we cannot test if a conditional task picks the right execution plan because the 
result of a .q file only shows the plan and the result. I think it is necessary 
to add a junit test to unit test the decision of resolveMapJoinTask. Also, 
let's add some logs in resolveMapJoinTask. Right now, we only have "xx is 
filtered out by condition resolver." and "xx is selected by condition 
resolver." in ConditionalTask. Through these two logs, we cannot know why a 
execution plan is selected. In resolveMapJoinTask, we can first log the size of 
tables which will be used in next task and then log why a path is selected.

> ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those 
> tables which are not used in the child of this conditional task.
> -----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-5945
>                 URL: https://issues.apache.org/jira/browse/HIVE-5945
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0
>            Reporter: Yin Huai
>            Assignee: Navis
>            Priority: Critical
>         Attachments: HIVE-5945.1.patch.txt, HIVE-5945.2.patch.txt
>
>
> Here is an example
> {code}
> select
>    i_item_id,
>    s_state,
>    avg(ss_quantity) agg1,
>    avg(ss_list_price) agg2,
>    avg(ss_coupon_amt) agg3,
>    avg(ss_sales_price) agg4
> FROM store_sales
> JOIN date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk)
> JOIN item on (store_sales.ss_item_sk = item.i_item_sk)
> JOIN customer_demographics on (store_sales.ss_cdemo_sk = 
> customer_demographics.cd_demo_sk)
> JOIN store on (store_sales.ss_store_sk = store.s_store_sk)
> where
>    cd_gender = 'F' and
>    cd_marital_status = 'U' and
>    cd_education_status = 'Primary' and
>    d_year = 2002 and
>    s_state in ('GA','PA', 'LA', 'SC', 'MI', 'AL')
> group by
>    i_item_id,
>    s_state
> order by
>    i_item_id,
>    s_state
> limit 100;
> {\code}
> I turned off noconditionaltask. So, I expected that there will be 4 Map-only 
> jobs for this query. However, I got 1 Map-only job (joining strore_sales and 
> date_dim) and 3 MR job (for reduce joins.)
> So, I checked the conditional task determining the plan of the join involving 
> item. In ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask, 
> aliasToFileSizeMap contains all input tables used in this query and the 
> intermediate table generated by joining store_sales and date_dim. So, when we 
> sum the size of all small tables, the size of store_sales (which is around 
> 45GB in my test) will be also counted.  



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to