Hi Zoltan, I have run another fresh TPC-DS test using the latest commit. Here is the summary:
Commits used: 1) Hive, master, e9f72e654750de208227d46a22e983413b080c6c (HIVE-24366, Thu Nov 12) 2) Tez, 0.10.0, 22fec6c0ecc7ebe6f6f28800935cc6f69794dad5 (CHANGES.txt updated with TEZ-4238, Thu Oct 8) Scenario: 1) create a database consisting of external tables from a 100GB TPC-DS text dataset 2) create a database consisting of ORC tables 3) compute column statistics, set tez.runtime.compress=false 4) run TPC-DS queries and check the results Configuration: 1) set hive.execution.engine=tez, hive.execution.mode=container 2) set hive.cbo.enable=true Experiment #1: hive.optimize.shared.work.dppunion=true Query 2 fails: java.lang.IllegalArgumentException: Edge [Reducer 9 : org.apache.hadoop.hive.ql.exec.tez.ReduceTezProcessor] -> [Map 6 : org.apache.hadoop.hive.ql.exec.tez.MapTezProcessor] ({ BROADCAST : org.apache.tez.runtime.library.input.UnorderedKVInput >> PERSISTED >> org.apache.tez.runtime.library.output.UnorderedKVOutput >> NullEdgeManager }) already defined! Query 14 fails: org.apache.hadoop.hive.ql.parse.SemanticException: EXCEPT and INTERSECT operations are only supported with Cost Based Optimizations enabled. Please set 'hive.cbo.enable' to true! Query 59 fails: java.lang.IllegalArgumentException: Edge [Reducer 6 : org.apache.hadoop.hive.ql.exec.tez.ReduceTezProcessor] -> [Map 4 : org.apache.hadoop.hive.ql.exec.tez.MapTezProcessor] ({ BROADCAST : org.apache.tez.runtime.library.input.UnorderedKVInput >> PERSISTED >> org.apache.tez.runtime.library.output.UnorderedKVOutput >> NullEdgeManager }) already defined! Experiment #2: hive.optimize.shared.work.dppunion=false Query 14 fails: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException EXCEPT and INTERSECT operations are only supported with Cost Based Optimizations enabled. Please set 'hive.cbo.enable' to true! Summary: 1. With hive.optimize.shared.work.dppunion=true, query 2 and 59 fail. Please see the attachment for stack traces. 2. Query 14 fails in both cases, and it seems like another bug. Note that when hive.cbo.enable is set to true when running query 14. 3. For some queries, the number of rows is different between the two experiments. In most cases, it seems to be rounding errors, but the difference is rather large for some queries (e.g., query 29 and 58). Please see the attachment for the result. I could open a new Jira for this issue, or create a sub-task of HIVE-24384. Or perhaps HIVE-24384 is already enough. So please let me know which would be good for you. (I have automated the entire experiment, so if you would like to see the result of testing a new commit, I would be happy to rerun the experiment and get back to you.) Cheers, --- Sungwoo On Thu, Nov 12, 2020 at 10:49 PM Zoltan Haindrich <k...@rxd.hu> wrote: > Hey Sungwoo! > > On 11/12/20 10:23 AM, Sungwoo Park wrote: > > Hi Zoltan, > > > > I used the same hive-site.xml for the previous test (which was okay) and > > the new test (which failed), so my guess is that it is perhaps due to a > > commit since the previous test. Let me try later to identify the commit > > that fails query 14, with the hope that identifying such a commit might > be > > useful in debugging. > > That would definetly help - if you could share the 2 commit hashes; it > might be possible that we could guess it from the commit message or > something. > > > > Another question: is HIVE-24360 part of a solution to the problem of > > hive.optimize.shared.work.dppunion? > > I have tried the latest commit (which includes HIVE-24360) using the > TPC-DS > > benchmark, and it seems like the problem still exists. > > Yes, HIVE-24360 should have fixed that - do you still see an exception > coming from tez-api reporting edge errors? > I will also pick these changes for a smaller benchmark run soon...but I'm > not running any right now. Could also note for which query you've seen the > exception - so that I > could also check it. > Could you please open a jira about this - and add the actual exception > trace/etc if available? > > cheers, > Zoltan > > > > > Cheers, > > > > --- Sungwoo > > > > On Mon, Nov 9, 2020 at 6:18 PM Zoltan Haindrich <k...@rxd.hu> wrote: > > > >> Hey Sungwoo! > >> > >> Regarding Q14 / "java.lang.RuntimeException: equivalence mapping > violation" > >> > >> From the stack trace you shared it seems like the mapper have already > >> seen both the filter and the ast node earlier - and they are in separate > >> mapping groups. (Which is > >> unfortunate) I think it won't be simple to track that down - it will > >> definetly need some debugging. > >> The best would be to have a repro query for it... > >> > >> note: we already run q14 in TestTezPerf*Driver - could it might be > >> possible that we've disabled some features in the hive-site.xml for > these > >> tests; and that's why we > >> haven't seen it before? > >> > >> cheers, > >> Zoltan > >> > >> > > >
1) Hive, master, e9f72e654750de208227d46a22e983413b080c6c (HIVE-24366, Thu Nov 12) 2) Tez, 0.10.0, 22fec6c0ecc7ebe6f6f28800935cc6f69794dad5 (CHANGES.txt updated with TEZ-4238, Thu Oct 8) 1) create a database consisting of external tables from a 100GB TPC-DS text dataset 2) create a database consisting of ORC tables 3) compute column statistics, set tez.runtime.compress=false 4) run TPC-DS queries and check the results 1) set hive.execution.engine=tez, hive.execution.mode=container 2) set hive.cbo.enable=true === hive.optimize.shared.work.dppunion=true Query 2 fails: java.lang.IllegalArgumentException: Edge [Reducer 9 : org.apache.hadoop.hive.ql.exec.tez.ReduceTezProcessor] -> [Map 6 : org.apache.hadoop.hive.ql.exec.tez.MapTezProcessor] ({ BROADCAST : org.apache.tez.runtime.library.input.UnorderedKVInput >> PERSISTED >> org.apache.tez.runtime.library.output.UnorderedKVOutput >> NullEdgeManager }) already defined! at org.apache.tez.dag.api.DAG.addEdge(DAG.java:297) at org.apache.hadoop.hive.ql.exec.tez.TezTask.build(TezTask.java:519) at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:213) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:361) at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:334) at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:245) at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:108) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:326) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:144) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:164) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:228) at org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:88) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:325) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:343) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Query 14 fails: org.apache.hadoop.hive.ql.parse.SemanticException: EXCEPT and INTERSECT operations are only supported with Cost Based Optimizations enabled. Please set 'hive.cbo.enable' to true! at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11637) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11669) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11642) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11669) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11642) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11669) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11655) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlanForSubQueryPredicate(SemanticAnalyzer.java:3386) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:3484) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10834) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11785) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11642) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11645) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11645) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11669) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11642) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11669) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11655) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:12437) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:718) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12548) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:443) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301) at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:469) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:421) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:385) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:379) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125) at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199) at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:262) at org.apache.hive.service.cli.operation.Operation.run(Operation.java:277) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:560) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:545) at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) at com.sun.proxy.$Proxy45.executeStatementAsync(Unknown Source) at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315) at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:571) at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1550) at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1530) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:313) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Query 59 fails: java.lang.IllegalArgumentException: Edge [Reducer 6 : org.apache.hadoop.hive.ql.exec.tez.ReduceTezProcessor] -> [Map 4 : org.apache.hadoop.hive.ql.exec.tez.MapTezProcessor] ({ BROADCAST : org.apache.tez.runtime.library.input.UnorderedKVInput >> PERSISTED >> org.apache.tez.runtime.library.output.UnorderedKVOutput >> NullEdgeManager }) already defined! at org.apache.tez.dag.api.DAG.addEdge(DAG.java:297) at org.apache.hadoop.hive.ql.exec.tez.TezTask.build(TezTask.java:519) at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:213) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:361) at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:334) at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:245) at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:108) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:326) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:144) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:164) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:228) at org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:88) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:325) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:343) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) === hive.optimize.shared.work.dppunion=false Query 14 fails: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException EXCEPT and INTERSECT operations are only supported with Cost Based Optimizations enabled. Please set 'hive.cbo.enable' to true!
tpcds.100gb.hive4.test.hive.optimize.shared.work.dppunion.xlsx
Description: MS-Excel 2007 spreadsheet