"hive cannot merge joins of 16+ aliases with outer join into single stage." In our use case we use one table full outer join all other table to produce one big table, which may exceed 16 outer join limits and will be split into multi stage under hive 0.10. It become very slow under hive 0.10 while we run such query well under hive 0.9. I believe it's due to the diff of query plan. I wonder why hive 0.10 cannot merge join 16+ aliases into single stage while hive 0.9 doesn't have such issue. could you explain this or give me some hint?
Thanks! -- wzc1989 已使用 Sparrow (http://www.sparrowmailapp.com/?sig) 在 2013年5月14日星期二,下午12:26,Navis류승우 写道: > The error message means hive cannot merge joins of 16+ aliases with > outer join into single stage. It was 8 way originally (HIVE-3411) but > expanded to 16 later. > > Check https://issues.apache.org/jira/browse/HIVE-3411 for details. > > 2013/5/14 wzc1989 <wzc1...@gmail.com (mailto:wzc1...@gmail.com)>: > > This time i cherry-pick HIVE-3464, HIVE-4212, HIVE-4206 and some related > > commits and the above explain result matches in hive 0.9 and hive 0.10, > > thanks! > > But I confuse about this error msg: > > > > JOINNODE_OUTERJOIN_MORETHAN_16(10142, "Single join node containing outer > > join(s) " + > > "cannot have more than 16 aliases"), > > > > does this mean in hive0.10 when we have more than 16 outer join the query > > plan will still have some bug? > > I test the sql below and find the explain result still diff between hive 0.9 > > and hive 0.10. > > > > explain select > > sum(a.value) val > > from default.test_join a > > left outer join default.test_join b on a.key = b.key > > left outer join default.test_join c on a.key = c.key > > left outer join default.test_join d on a.key = d.key > > left outer join default.test_join e on a.key = e.key > > left outer join default.test_join f on a.key = f.key > > left outer join default.test_join g on a.key = g.key > > left outer join default.test_join h on a.key = h.key > > left outer join default.test_join i on a.key = i.key > > left outer join default.test_join j on a.key = j.key > > left outer join default.test_join k on a.key = k.key > > left outer join default.test_join l on a.key = l.key > > left outer join default.test_join m on a.key = m.key > > left outer join default.test_join n on a.key = n.key > > left outer join default.test_join u on a.key = u.key > > left outer join default.test_join v on a.key = v.key > > left outer join default.test_join w on a.key = w.key > > left outer join default.test_join x on a.key = x.key > > left outer join default.test_join z on a.key = z.key > > > > > > -- > > wzc1989 > > 已使用 Sparrow > > > > 在 2013年3月29日星期五,上午9:34,Navis류승우 写道: > > > > The problem is mixture of issues (HIVE-3411, HIVE-4209, HIVE-4212, > > HIVE-3464) and still not completely fixed even in trunk. > > > > Will be fixed shortly. > > > > 2013/3/29 wzc <wzc1...@gmail.com (mailto:wzc1...@gmail.com)>: > > > > The bug remains even if I apply the patch in HIVE-4206 :( The explain > > result hasn't change. > > > > > > 2013/3/28 Navis류승우 <navis....@nexr.com (mailto:navis....@nexr.com)> > > > > > > It's a bug (https://issues.apache.org/jira/browse/HIVE-4206). > > > > Thanks for reporting it. > > > > 2013/3/24 wzc <wzc1...@gmail.com (mailto:wzc1...@gmail.com)>: > > > > Recently we tried to upgrade our hive from 0.9 to 0.10, but found some > > of > > our hive queries almost 7 times slow. One of such query consists > > multiple > > table outer join on the same key. By looking into the query, we found > > the > > query plans generate by hive 0.9 and hive 0.10 are different. Here is > > the > > example: > > > > testcase: > > > > use default; > > create table test_join ( > > `key` string, > > `value` string > > ); > > > > explain select > > sum(a.value) val > > from default.test_join a > > left outer join default.test_join b on a.key = b.key > > left outer join default.test_join c on a.key = c.key > > left outer join default.test_join d on a.key = d.key > > left outer join default.test_join e on a.key = e.key > > left outer join default.test_join f on a.key = f.key > > left outer join default.test_join g on a.key = g.key > > > > > > the explain of hive 0.9: > > > > STAGE DEPENDENCIES: > > > > Stage-1 is a root stage > > > > Stage-2 depends on stages: Stage-1 > > > > Stage-0 is a root stage > > > > ... > > > > Reduce Operator Tree: > > > > Join Operator > > > > condition map: > > > > Left Outer Join0 to 1 > > > > Left Outer Join0 to 2 > > > > Left Outer Join0 to 3 > > > > Left Outer Join0 to 4 > > > > Left Outer Join0 to 5 > > > > Left Outer Join0 to 6 > > > > condition expressions: > > > > 0 {VALUE._col1} > > > > 1 > > > > 2 > > > > 3 > > > > 4 > > > > 5 > > > > 6 > > > > ...... > > > > > > while the explain of hive 0.10: > > > > STAGE DEPENDENCIES: > > > > Stage-6 is a root stage > > > > Stage-1 depends on stages: Stage-6 > > > > Stage-2 depends on stages: Stage-1 > > > > Stage-0 is a root stage > > > > ... > > > > Reduce Operator Tree: > > > > Join Operator > > > > condition map: > > > > Left Outer Join0 to 1 > > > > Left Outer Join0 to 2 > > > > condition expressions: > > > > 0 {VALUE._col0} {VALUE._col1} > > > > 1 > > > > 2 > > > > ... > > > > Reduce Operator Tree: > > > > Join Operator > > > > condition map: > > > > Left Outer Join0 to 1 > > > > Left Outer Join0 to 2 > > > > Left Outer Join0 to 3 > > > > Left Outer Join0 to 4 > > > > condition expressions: > > > > 0 {VALUE._col9} > > > > 1 > > > > 2 > > > > 3 > > > > 4 > > > > .... > > > > > > It seems like hive 0.9 use only one stage/job to process all outer joins > > but > > hive 0.10 split them into two stage. When running such kind of query on > > hive0.10 in production, in the second stage of outer join process, some > > reducer stucks. > > > > I can't find any param to change the query plain , can anyone give me > > some > > hint? > > > > Thanks!