Hello,

I have tested a recent commit of the master branch using the TPC-DS
benchmark. I used Hive on Tez (not Hive-LLAP). The way I tested is:

1) create a database consisting of external tables from a 100GB TPC-DS text
dataset
2) create a database consisting of ORC tables from the previous database
3) compute column statistics
4) run TPC-DS queries and check the results

Previously we tested the commit 5f47808c02816edcd4c323dfa25194536f3f20fd
(HIVE-23114: Insert overwrite with dynamic partitioning is not working
correctly with direct insert, Fri Apr 10), and all queries ran okay.

This time I used the following commits. I made a few changes to pom.xml of
both Hive and Tez, but these changes should not affect the result of
running queries.

1) Hive, master, 96aacdc50043fa442c2277b7629812e69241a507 (Tue Nov
3), HIVE-24314: compactor.Cleaner should not set state mark cleaned if it
didn't remove any files
2) Tez, 0.10.0, 22fec6c0ecc7ebe6f6f28800935cc6f69794dad5 (Thu Oct
8), CHANGES.txt updated with TEZ-4238

The result is that 14 queries (out of 99 queries) fail, and a query fails
during compilation for one of the following two reasons.

1)
org.apache.hive.service.cli.HiveSQLException: Error while compiling
statement: FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.tez.TezTask. Edge [Map 12 :
org.apache.hadoop.hive.ql.exec.tez.MapTezProcessor] -> [Map 7 :
org.apache.hadoop.hive.ql.exec.tez.MapTezProcessor] ({ BROADCAST :
org.apache.tez.runtime.library.input.UnorderedKVInput >> PERSISTED >>
org.apache.tez.runtime.library.output.UnorderedKVOutput >> NullEdgeManager
}) already defined!
  at
org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:365)
  at
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:241)
  at
org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:88)
  at
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:325)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:422)
  at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
  at
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:343)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: Edge [Map 12 :
org.apache.hadoop.hive.ql.exec.tez.MapTezProcessor] -> [Map 7 :
org.apache.hadoop.hive.ql.exec.tez.MapTezProcessor] ({ BROADCAST :
org.apache.tez.runtime.library.input.UnorderedKVInput >> PERSISTED >>
org.apache.tez.runtime.library.output.UnorderedKVOutput >> NullEdgeManager
}) already defined!
  at org.apache.tez.dag.api.DAG.addEdge(DAG.java:297)
  at org.apache.hadoop.hive.ql.exec.tez.TezTask.build(TezTask.java:519)
  at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:213)
  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
  at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
  at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:361)
  at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:334)
  at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:245)
  at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:108)
  at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:326)
  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149)
  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:144)
  at
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:164)
  at
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:228)
  ... 11 more

2)
Caused by: java.lang.NullPointerException
  at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4491)
  at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4474)
  at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:10940)
  at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10882)
  at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11776)
  at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11633)
  at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11660)
  at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11633)
  at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11660)
  at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11646)
  at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlanForSubQueryPredicate(SemanticAnalyzer.java:3386)
  at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:3484)
  at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10830)
  at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11776)
  at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11633)
  at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11636)
  at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11636)
  at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11660)
  at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11633)
  at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11660)
  at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11646)
  at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:12428)
  at
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:718)
  at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12539)
  at
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:443)
  at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301)
  at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223)

There are 702 commits between the last commit of April 10 and the recent
commit, and I guess some of these commits introduce the bugs shown above. I
could perform a manual search to locate the commits, but it is not always
easy to build Hive using a particular commit (mostly because of version
conflicts with Tez, Hadoop, Guava, or others).

If anybody can suggest specific commits to test, or guide me in
bug-hunting, please let me know. There is another bug that shows itself
when computing column statistics, but we can suppress it by using default
values in tez-site.xml.

Thanks,

--- Sungwoo

Reply via email to