Hi Sungwoo, As far as it concerns query14 the problem is logged in HIVE-24167 [1]. There is also a PR [2] for reproducing the problem so it should be feasible to find the offending commit with git bisect.
For queries 2 and 59, I am also able to reproduce the behavior that you mentioned (same reducer appearing twice in the same mapper) in the same PR [2], and I think we should raise a JIRA ticket as well. Best, Stamatis [1] https://issues.apache.org/jira/browse/HIVE-24167 [2] https://github.com/apache/hive/pull/1347 On Wed, Nov 18, 2020 at 1:33 AM Sungwoo Park <glap...@gmail.com> wrote: > > 1. With hive.optimize.shared.work.dppunion=true, query 2 and 59 fail. >> Please see the attachment for stack traces. >> >> Even thru the exception seem to be a reoccurance of the previous issue - >> existing checks + HIVE-24360 should have restricted all incorrect cases. >> I built in some debug stuff while I made these patches - and it would >> help a lot to get a peek into those; but they need to be enabled by >> hand/etc...while I polish that a >> bit more - could you please share an EXPLAIN FORMATTED about one of the >> queries failing because of that patch? >> > > Please see the attachment for the result of EXPLAIN on query 12. (EXPLAIN > FORMATTED seems to have some problem.) Hive tries to create two broadcast > edges from Reducer 8 to Map 6, thus raising an exception. > > > 2. Query 14 fails in both cases, and it seems like another bug. Note >> that when hive.cbo.enable is set to true when running query 14. >> >> I think you will find some cbo exception in the hive logs - explaining >> why it resorts to the non-cbo path. >> > > Indeed it raises RuntimeException: > > 20/11/17 13:04:22 ERROR parse.CalcitePlanner: CBO failed, skipping CBO. > java.lang.RuntimeException: equivalence mapping violation > at > org.apache.hadoop.hive.ql.plan.mapper.PlanMapper.link(PlanMapper.java:220) > > Please see the attachment for the full stack trace. > > >> > 3. For some queries, the number of rows is different between the two >> experiments. In most cases, it seems to be rounding errors, but the >> difference is rather large for >> some queries (e.g., query 29 and 58). Please see the attachment for the >> result. >> >> that's very odd - I've recently fixed a bug in swo which may have caused >> issues like this(HIVE-24365); I would recommend to compare the result with >> the whole thing off >> (hive.optimize.shared.work=false). >> If you could isolate and reproduce this in a qtest I could also dig into >> it. > > > For now, let me report the result of testing HIVE-24366. Please see the > attachment for the result. > > HIVE-24366 (e9f72e654750de208227d46a22e983413b080c6c, Thu Nov 12) > > TEZ-4238 (22fec6c0ecc7ebe6f6f28800935cc6f69794dad5, Thu Oct 8) > guava.version=19.0 in pom.xml > hadoop.version=3.1.0 in pom.xml > > TPC-DS 100GB ORC > > hive.execution.engine=tez > hive.execution.mode=container, Tez containers are not reused across > queries. > hive.cbo.enable=true > hive.query.reexecution.stats.persist.scope=metastore (default value) > > 1) hive.optimize.shared.work = false > 2) hive.optimize.shared.work = true, hive.optimize.shared.work.dppunion = > true > 3) hive.optimize.shared.work = true, hive.optimize.shared.work.dppunion > = false > > For each case, the first column reports the execution time and the second > column reports the number of rows. If the number of rows is 1, it also > reports the sum of all values in the result. > > Cheers, > > --- Sungwoo >