Hi Sungwoo, There is https://issues.apache.org/jira/browse/HIVE-23975 causing a regression in runtime. There is a ticket open to fix it ( https://issues.apache.org/jira/browse/HIVE-24139) which is still in progress. You might want to revert 23975 before trying.
On Wed, Nov 4, 2020 at 2:55 PM Stamatis Zampetakis <zabe...@gmail.com> wrote: > Hi Sungwoo, > > Personally, I would be also interested to see the results of these > experiments if they are available somewhere. > > I didn't understand if the queries are failing at runtime or compile time. > Are the above errors the only ones that you're getting? > > If you can reproduce the problem with a smaller dataset then I think the > best would be to create unit tests and JIRAS for each query separately. > > It may not be worth going through the commits to find those that caused the > regression because it will be time-consuming and you may bump into > something that is not trivial to revert. > > Best, > Stamatis > > > On Wed, Nov 4, 2020 at 7:24 PM Sungwoo Park <glap...@gmail.com> wrote: > > > Hello, > > > > I have tested a recent commit of the master branch using the TPC-DS > > benchmark. I used Hive on Tez (not Hive-LLAP). The way I tested is: > > > > 1) create a database consisting of external tables from a 100GB TPC-DS > text > > dataset > > 2) create a database consisting of ORC tables from the previous database > > 3) compute column statistics > > 4) run TPC-DS queries and check the results > > > > Previously we tested the commit 5f47808c02816edcd4c323dfa25194536f3f20fd > > (HIVE-23114: Insert overwrite with dynamic partitioning is not working > > correctly with direct insert, Fri Apr 10), and all queries ran okay. > > > > This time I used the following commits. I made a few changes to pom.xml > of > > both Hive and Tez, but these changes should not affect the result of > > running queries. > > > > 1) Hive, master, 96aacdc50043fa442c2277b7629812e69241a507 (Tue Nov > > 3), HIVE-24314: compactor.Cleaner should not set state mark cleaned if it > > didn't remove any files > > 2) Tez, 0.10.0, 22fec6c0ecc7ebe6f6f28800935cc6f69794dad5 (Thu Oct > > 8), CHANGES.txt updated with TEZ-4238 > > > > The result is that 14 queries (out of 99 queries) fail, and a query fails > > during compilation for one of the following two reasons. > > > > 1) > > org.apache.hive.service.cli.HiveSQLException: Error while compiling > > statement: FAILED: Execution Error, return code 1 from > > org.apache.hadoop.hive.ql.exec.tez.TezTask. Edge [Map 12 : > > org.apache.hadoop.hive.ql.exec.tez.MapTezProcessor] -> [Map 7 : > > org.apache.hadoop.hive.ql.exec.tez.MapTezProcessor] ({ BROADCAST : > > org.apache.tez.runtime.library.input.UnorderedKVInput >> PERSISTED >> > > org.apache.tez.runtime.library.output.UnorderedKVOutput >> > NullEdgeManager > > }) already defined! > > at > > > > > org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:365) > > at > > > > > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:241) > > at > > > > > org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:88) > > at > > > > > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:325) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:422) > > at > > > > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > > at > > > > > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:343) > > at > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > > at > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > > at > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > > at java.lang.Thread.run(Thread.java:745) > > Caused by: java.lang.IllegalArgumentException: Edge [Map 12 : > > org.apache.hadoop.hive.ql.exec.tez.MapTezProcessor] -> [Map 7 : > > org.apache.hadoop.hive.ql.exec.tez.MapTezProcessor] ({ BROADCAST : > > org.apache.tez.runtime.library.input.UnorderedKVInput >> PERSISTED >> > > org.apache.tez.runtime.library.output.UnorderedKVOutput >> > NullEdgeManager > > }) already defined! > > at org.apache.tez.dag.api.DAG.addEdge(DAG.java:297) > > at org.apache.hadoop.hive.ql.exec.tez.TezTask.build(TezTask.java:519) > > at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:213) > > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) > > at > > > > > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > > at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:361) > > at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:334) > > at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:245) > > at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:108) > > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:326) > > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) > > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:144) > > at > > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:164) > > at > > > > > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:228) > > ... 11 more > > > > 2) > > Caused by: java.lang.NullPointerException > > at > > > > > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4491) > > at > > > > > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4474) > > at > > > > > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:10940) > > at > > > > > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10882) > > at > > > > > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11776) > > at > > > > > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11633) > > at > > > > > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11660) > > at > > > > > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11633) > > at > > > > > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11660) > > at > > > > > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11646) > > at > > > > > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlanForSubQueryPredicate(SemanticAnalyzer.java:3386) > > at > > > > > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:3484) > > at > > > > > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10830) > > at > > > > > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11776) > > at > > > > > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11633) > > at > > > > > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11636) > > at > > > > > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11636) > > at > > > > > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11660) > > at > > > > > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11633) > > at > > > > > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11660) > > at > > > > > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11646) > > at > > > > > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:12428) > > at > > > > > org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:718) > > at > > > > > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12539) > > at > > > > > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:443) > > at > > > > > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301) > > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) > > > > There are 702 commits between the last commit of April 10 and the recent > > commit, and I guess some of these commits introduce the bugs shown > above. I > > could perform a manual search to locate the commits, but it is not always > > easy to build Hive using a particular commit (mostly because of version > > conflicts with Tez, Hadoop, Guava, or others). > > > > If anybody can suggest specific commits to test, or guide me in > > bug-hunting, please let me know. There is another bug that shows itself > > when computing column statistics, but we can suppress it by using default > > values in tez-site.xml. > > > > Thanks, > > > > --- Sungwoo > > >