I looked a bit better on the plans of query2 and query59 and there is something weird with the semi joins that appear in the plan.
A possible workaround, till we fix the problem, would be to disable semi joins for these two queries: set hive.tez.dynamic.semijoin.reduction=false; Best, Stamatis On Wed, Nov 18, 2020 at 3:29 PM Stamatis Zampetakis <zabe...@gmail.com> wrote: > Hi Sungwoo, > > As far as it concerns query14 the problem is logged in HIVE-24167 [1]. > There is also a PR [2] for reproducing the problem so it should be feasible > to find the offending commit with git bisect. > > For queries 2 and 59, I am also able to reproduce the behavior that you > mentioned (same reducer appearing twice in the same mapper) in the same PR > [2], and I think we should raise a JIRA ticket as well. > > Best, > Stamatis > > [1] https://issues.apache.org/jira/browse/HIVE-24167 > [2] https://github.com/apache/hive/pull/1347 > > On Wed, Nov 18, 2020 at 1:33 AM Sungwoo Park <glap...@gmail.com> wrote: > >> > 1. With hive.optimize.shared.work.dppunion=true, query 2 and 59 fail. >>> Please see the attachment for stack traces. >>> >>> Even thru the exception seem to be a reoccurance of the previous issue - >>> existing checks + HIVE-24360 should have restricted all incorrect cases. >>> I built in some debug stuff while I made these patches - and it would >>> help a lot to get a peek into those; but they need to be enabled by >>> hand/etc...while I polish that a >>> bit more - could you please share an EXPLAIN FORMATTED about one of the >>> queries failing because of that patch? >>> >> >> Please see the attachment for the result of EXPLAIN on query 12. (EXPLAIN >> FORMATTED seems to have some problem.) Hive tries to create two broadcast >> edges from Reducer 8 to Map 6, thus raising an exception. >> >> > 2. Query 14 fails in both cases, and it seems like another bug. Note >>> that when hive.cbo.enable is set to true when running query 14. >>> >>> I think you will find some cbo exception in the hive logs - explaining >>> why it resorts to the non-cbo path. >>> >> >> Indeed it raises RuntimeException: >> >> 20/11/17 13:04:22 ERROR parse.CalcitePlanner: CBO failed, skipping CBO. >> java.lang.RuntimeException: equivalence mapping violation >> at >> org.apache.hadoop.hive.ql.plan.mapper.PlanMapper.link(PlanMapper.java:220) >> >> Please see the attachment for the full stack trace. >> >> >>> > 3. For some queries, the number of rows is different between the two >>> experiments. In most cases, it seems to be rounding errors, but the >>> difference is rather large for >>> some queries (e.g., query 29 and 58). Please see the attachment for the >>> result. >>> >>> that's very odd - I've recently fixed a bug in swo which may have caused >>> issues like this(HIVE-24365); I would recommend to compare the result with >>> the whole thing off >>> (hive.optimize.shared.work=false). >>> If you could isolate and reproduce this in a qtest I could also dig into >>> it. >> >> >> For now, let me report the result of testing HIVE-24366. Please see the >> attachment for the result. >> >> HIVE-24366 (e9f72e654750de208227d46a22e983413b080c6c, Thu Nov 12) >> >> TEZ-4238 (22fec6c0ecc7ebe6f6f28800935cc6f69794dad5, Thu Oct 8) >> guava.version=19.0 in pom.xml >> hadoop.version=3.1.0 in pom.xml >> >> TPC-DS 100GB ORC >> >> hive.execution.engine=tez >> hive.execution.mode=container, Tez containers are not reused across >> queries. >> hive.cbo.enable=true >> hive.query.reexecution.stats.persist.scope=metastore (default value) >> >> 1) hive.optimize.shared.work = false >> 2) hive.optimize.shared.work = true, hive.optimize.shared.work.dppunion >> = true >> 3) hive.optimize.shared.work = true, hive.optimize.shared.work.dppunion >> = false >> >> For each case, the first column reports the execution time and the second >> column reports the number of rows. If the number of rows is 1, it also >> reports the sum of all values in the result. >> >> Cheers, >> >> --- Sungwoo >> >