Hi navis: Thanks for your reply. Currently I'm working on the temporary solution by changing the type of filter mask and doing the performance test. I try to read the patches and source code now and when I get better understanding of the code maybe I can help with this problem :)
-- wzc1989 已使用 Sparrow (http://www.sparrowmailapp.com/?sig) 在 2013年7月2日星期二,上午8:24,Navis류승우 写道: > Yes, a little bit. > > IMHO, these flags could be assigned only for aliases with condition on > 'on' clause. Then, I think, even a byte (8 flags) could be enough in > most cases. > > I'll do that if time permits. > > 2013/7/1 wzc1989 <wzc1...@gmail.com (mailto:wzc1...@gmail.com)>: > > hi navis: > > look at the patches in (HIVE-3411, HIVE-4206, HIVE-4212, HIVE-3464), I > > understand what you mean by "hive tags rows a filter mask as a short for > > outer join, which can contain 16 flags. " . I wonder why not choose Long or > > int which can contain 64/32 tags. Does adding one Long/int in every row cost > > too much? > > > > -- > > wzc1989 > > 已使用 Sparrow > > > > 在 2013年5月14日星期二,下午2:17,Navis류승우 写道: > > > > In short, hive tags rows a filter mask as a short for outer join, > > which can contain 16 flags. (see HIVE-3411, plz) > > > > I'll survey for a solution. > > > > 2013/5/14 wzc1989 <wzc1...@gmail.com (mailto:wzc1...@gmail.com)>: > > > > "hive cannot merge joins of 16+ aliases with outer join into single stage." > > In our use case we use one table full outer join all other table to produce > > one big table, which may exceed 16 outer join limits and will be split into > > multi stage under hive 0.10. > > It become very slow under hive 0.10 while we run such query well under hive > > 0.9. > > I believe it's due to the diff of query plan. I wonder why hive 0.10 cannot > > merge join 16+ aliases into single stage while hive 0.9 doesn't have such > > issue. could you explain this or give me some hint? > > > > Thanks! > > > > -- > > wzc1989 > > 已使用 Sparrow > > > > 在 2013年5月14日星期二,下午12:26,Navis류승우 写道: > > > > The error message means hive cannot merge joins of 16+ aliases with > > outer join into single stage. It was 8 way originally (HIVE-3411) but > > expanded to 16 later. > > > > Check https://issues.apache.org/jira/browse/HIVE-3411 for details. > > > > 2013/5/14 wzc1989 <wzc1...@gmail.com (mailto:wzc1...@gmail.com)>: > > > > This time i cherry-pick HIVE-3464, HIVE-4212, HIVE-4206 and some related > > commits and the above explain result matches in hive 0.9 and hive 0.10, > > thanks! > > But I confuse about this error msg: > > > > JOINNODE_OUTERJOIN_MORETHAN_16(10142, "Single join node containing outer > > join(s) " + > > "cannot have more than 16 aliases"), > > > > does this mean in hive0.10 when we have more than 16 outer join the query > > plan will still have some bug? > > I test the sql below and find the explain result still diff between hive 0.9 > > and hive 0.10. > > > > explain select > > sum(a.value) val > > from default.test_join a > > left outer join default.test_join b on a.key = b.key > > left outer join default.test_join c on a.key = c.key > > left outer join default.test_join d on a.key = d.key > > left outer join default.test_join e on a.key = e.key > > left outer join default.test_join f on a.key = f.key > > left outer join default.test_join g on a.key = g.key > > left outer join default.test_join h on a.key = h.key > > left outer join default.test_join i on a.key = i.key > > left outer join default.test_join j on a.key = j.key > > left outer join default.test_join k on a.key = k.key > > left outer join default.test_join l on a.key = l.key > > left outer join default.test_join m on a.key = m.key > > left outer join default.test_join n on a.key = n.key > > left outer join default.test_join u on a.key = u.key > > left outer join default.test_join v on a.key = v.key > > left outer join default.test_join w on a.key = w.key > > left outer join default.test_join x on a.key = x.key > > left outer join default.test_join z on a.key = z.key > > > > > > -- > > wzc1989 > > 已使用 Sparrow > > > > 在 2013年3月29日星期五,上午9:34,Navis류승우 写道: > > > > The problem is mixture of issues (HIVE-3411, HIVE-4209, HIVE-4212, > > HIVE-3464) and still not completely fixed even in trunk. > > > > Will be fixed shortly. > > > > 2013/3/29 wzc <wzc1...@gmail.com (mailto:wzc1...@gmail.com)>: > > > > The bug remains even if I apply the patch in HIVE-4206 :( The explain > > result hasn't change. > > > > > > 2013/3/28 Navis류승우 <navis....@nexr.com (mailto:navis....@nexr.com)> > > > > > > It's a bug (https://issues.apache.org/jira/browse/HIVE-4206). > > > > Thanks for reporting it. > > > > 2013/3/24 wzc <wzc1...@gmail.com (mailto:wzc1...@gmail.com)>: > > > > Recently we tried to upgrade our hive from 0.9 to 0.10, but found some > > of > > our hive queries almost 7 times slow. One of such query consists > > multiple > > table outer join on the same key. By looking into the query, we found > > the > > query plans generate by hive 0.9 and hive 0.10 are different. Here is > > the > > example: > > > > testcase: > > > > use default; > > create table test_join ( > > `key` string, > > `value` string > > ); > > > > explain select > > sum(a.value) val > > from default.test_join a > > left outer join default.test_join b on a.key = b.key > > left outer join default.test_join c on a.key = c.key > > left outer join default.test_join d on a.key = d.key > > left outer join default.test_join e on a.key = e.key > > left outer join default.test_join f on a.key = f.key > > left outer join default.test_join g on a.key = g.key > > > > > > the explain of hive 0.9: > > > > STAGE DEPENDENCIES: > > > > Stage-1 is a root stage > > > > Stage-2 depends on stages: Stage-1 > > > > Stage-0 is a root stage > > > > ... > > > > Reduce Operator Tree: > > > > Join Operator > > > > condition map: > > > > Left Outer Join0 to 1 > > > > Left Outer Join0 to 2 > > > > Left Outer Join0 to 3 > > > > Left Outer Join0 to 4 > > > > Left Outer Join0 to 5 > > > > Left Outer Join0 to 6 > > > > condition expressions: > > > > 0 {VALUE._col1} > > > > 1 > > > > 2 > > > > 3 > > > > 4 > > > > 5 > > > > 6 > > > > ...... > > > > > > while the explain of hive 0.10: > > > > STAGE DEPENDENCIES: > > > > Stage-6 is a root stage > > > > Stage-1 depends on stages: Stage-6 > > > > Stage-2 depends on stages: Stage-1 > > > > Stage-0 is a root stage > > > > ... > > > > Reduce Operator Tree: > > > > Join Operator > > > > condition map: > > > > Left Outer Join0 to 1 > > > > Left Outer Join0 to 2 > > > > condition expressions: > > > > 0 {VALUE._col0} {VALUE._col1} > > > > 1 > > > > 2 > > > > ... > > > > Reduce Operator Tree: > > > > Join Operator > > > > condition map: > > > > Left Outer Join0 to 1 > > > > Left Outer Join0 to 2 > > > > Left Outer Join0 to 3 > > > > Left Outer Join0 to 4 > > > > condition expressions: > > > > 0 {VALUE._col9} > > > > 1 > > > > 2 > > > > 3 > > > > 4 > > > > .... > > > > > > It seems like hive 0.9 use only one stage/job to process all outer joins > > but > > hive 0.10 split them into two stage. When running such kind of query on > > hive0.10 in production, in the second stage of outer join process, some > > reducer stucks. > > > > I can't find any param to change the query plain , can anyone give me > > some > > hint? > > > > Thanks!