> On Dec. 18, 2013, 1:47 a.m., Yin Huai wrote: > > ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverCommonJoin.java, > > line 242 > > <https://reviews.apache.org/r/16172/diff/1/?file=396419#file396419line242> > > > > aliasToKnownSize can also contain tables which will not be used in the > > next job. For example, we have a query like SELECT ... FROM a JOIN b ON > > (a.key1=b.key1) JOIN c ON (a.key2=b.key). Let's also assume that "a" is the > > big table. We can first use a Map only job to do a JOIN b. Then, we should > > evaluate the size of table c and the result of a JOIN b. But, at here, > > aliasToKnownSize also has the size of table a which will be counted in > > sumOfOthers.
No. it's not. Below is the log messages. [ConditionalResolverCommonJoin/resolveMapJoinTask] aliasToKnownSize : {b=11624, c=11624, a=11624} [ConditionalResolverCommonJoin/resolveMapJoinTask] aliases : [b, a] [ConditionalResolverCommonJoin/resolveMapJoinTask] aliasToKnownSize : {b=11624, c=11624, a=11624, $INTNAME=167608} [ConditionalResolverCommonJoin/resolveMapJoinTask] aliases : [c, $INTNAME] > On Dec. 18, 2013, 1:47 a.m., Yin Huai wrote: > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java, > > line 467 > > <https://reviews.apache.org/r/16172/diff/1/?file=396418#file396418line467> > > > > A question which is not very related to this issue. Have we documented > > that we prefer the right most alias as the big table? I also see we have > > such assumption in JoinOperator. Preferring the right most alias is introduced in this patch first (it was decided by iteration order of aliasToWork), changing result of auto_join25.q. (This part of change is not related to this very issue but I thought it's too confusing to understand) > On Dec. 18, 2013, 1:47 a.m., Yin Huai wrote: > > ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverCommonJoin.java, > > line 255 > > <https://reviews.apache.org/r/16172/diff/1/?file=396419#file396419line255> > > > > Let's change it to log the exception instead of printing the stack > > trace. ok. - Navis ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16172/#review30594 ----------------------------------------------------------- On Dec. 11, 2013, 2:12 a.m., Navis Ryu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/16172/ > ----------------------------------------------------------- > > (Updated Dec. 11, 2013, 2:12 a.m.) > > > Review request for hive. > > > Bugs: HIVE-5945 > https://issues.apache.org/jira/browse/HIVE-5945 > > > Repository: hive-git > > > Description > ------- > > Here is an example > {code} > select > i_item_id, > s_state, > avg(ss_quantity) agg1, > avg(ss_list_price) agg2, > avg(ss_coupon_amt) agg3, > avg(ss_sales_price) agg4 > FROM store_sales > JOIN date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk) > JOIN item on (store_sales.ss_item_sk = item.i_item_sk) > JOIN customer_demographics on (store_sales.ss_cdemo_sk = > customer_demographics.cd_demo_sk) > JOIN store on (store_sales.ss_store_sk = store.s_store_sk) > where > cd_gender = 'F' and > cd_marital_status = 'U' and > cd_education_status = 'Primary' and > d_year = 2002 and > s_state in ('GA','PA', 'LA', 'SC', 'MI', 'AL') > group by > i_item_id, > s_state > order by > i_item_id, > s_state > limit 100; > {\code} > I turned off noconditionaltask. So, I expected that there will be 4 Map-only > jobs for this query. However, I got 1 Map-only job (joining strore_sales and > date_dim) and 3 MR job (for reduce joins.) > > So, I checked the conditional task determining the plan of the join involving > item. In ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask, > aliasToFileSizeMap contains all input tables used in this query and the > intermediate table generated by joining store_sales and date_dim. So, when we > sum the size of all small tables, the size of store_sales (which is around > 45GB in my test) will be also counted. > > > Diffs > ----- > > ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 197a20f > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java > 2efa7c2 > > ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverCommonJoin.java > faf2f9b > > ql/src/test/org/apache/hadoop/hive/ql/plan/TestConditionalResolverCommonJoin.java > 67203c9 > ql/src/test/results/clientpositive/auto_join25.q.out 7427239 > > Diff: https://reviews.apache.org/r/16172/diff/ > > > Testing > ------- > > > Thanks, > > Navis Ryu > >