Hi Sungwoo,

Personally, I would be also interested to see the results of these
experiments if they are available somewhere.

I didn't understand if the queries are failing at runtime or compile time.
Are the above errors the only ones that you're getting?

If you can reproduce the problem with a smaller dataset then I think the
best would be to create unit tests and JIRAS for each query separately.

It may not be worth going through the commits to find those that caused the
regression because it will be time-consuming and you may bump into
something that is not trivial to revert.

Best,
Stamatis


On Wed, Nov 4, 2020 at 7:24 PM Sungwoo Park <glap...@gmail.com> wrote:

> Hello,
>
> I have tested a recent commit of the master branch using the TPC-DS
> benchmark. I used Hive on Tez (not Hive-LLAP). The way I tested is:
>
> 1) create a database consisting of external tables from a 100GB TPC-DS text
> dataset
> 2) create a database consisting of ORC tables from the previous database
> 3) compute column statistics
> 4) run TPC-DS queries and check the results
>
> Previously we tested the commit 5f47808c02816edcd4c323dfa25194536f3f20fd
> (HIVE-23114: Insert overwrite with dynamic partitioning is not working
> correctly with direct insert, Fri Apr 10), and all queries ran okay.
>
> This time I used the following commits. I made a few changes to pom.xml of
> both Hive and Tez, but these changes should not affect the result of
> running queries.
>
> 1) Hive, master, 96aacdc50043fa442c2277b7629812e69241a507 (Tue Nov
> 3), HIVE-24314: compactor.Cleaner should not set state mark cleaned if it
> didn't remove any files
> 2) Tez, 0.10.0, 22fec6c0ecc7ebe6f6f28800935cc6f69794dad5 (Thu Oct
> 8), CHANGES.txt updated with TEZ-4238
>
> The result is that 14 queries (out of 99 queries) fail, and a query fails
> during compilation for one of the following two reasons.
>
> 1)
> org.apache.hive.service.cli.HiveSQLException: Error while compiling
> statement: FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Edge [Map 12 :
> org.apache.hadoop.hive.ql.exec.tez.MapTezProcessor] -> [Map 7 :
> org.apache.hadoop.hive.ql.exec.tez.MapTezProcessor] ({ BROADCAST :
> org.apache.tez.runtime.library.input.UnorderedKVInput >> PERSISTED >>
> org.apache.tez.runtime.library.output.UnorderedKVOutput >> NullEdgeManager
> }) already defined!
>   at
>
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:365)
>   at
>
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:241)
>   at
>
> org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:88)
>   at
>
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:325)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at
>
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:343)
>   at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: Edge [Map 12 :
> org.apache.hadoop.hive.ql.exec.tez.MapTezProcessor] -> [Map 7 :
> org.apache.hadoop.hive.ql.exec.tez.MapTezProcessor] ({ BROADCAST :
> org.apache.tez.runtime.library.input.UnorderedKVInput >> PERSISTED >>
> org.apache.tez.runtime.library.output.UnorderedKVOutput >> NullEdgeManager
> }) already defined!
>   at org.apache.tez.dag.api.DAG.addEdge(DAG.java:297)
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.build(TezTask.java:519)
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:213)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
>   at
>
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:361)
>   at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:334)
>   at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:245)
>   at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:108)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:326)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:144)
>   at
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:164)
>   at
>
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:228)
>   ... 11 more
>
> 2)
> Caused by: java.lang.NullPointerException
>   at
>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4491)
>   at
>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4474)
>   at
>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:10940)
>   at
>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10882)
>   at
>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11776)
>   at
>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11633)
>   at
>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11660)
>   at
>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11633)
>   at
>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11660)
>   at
>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11646)
>   at
>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlanForSubQueryPredicate(SemanticAnalyzer.java:3386)
>   at
>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:3484)
>   at
>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10830)
>   at
>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11776)
>   at
>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11633)
>   at
>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11636)
>   at
>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11636)
>   at
>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11660)
>   at
>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11633)
>   at
>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11660)
>   at
>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11646)
>   at
>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:12428)
>   at
>
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:718)
>   at
>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12539)
>   at
>
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:443)
>   at
>
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301)
>   at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223)
>
> There are 702 commits between the last commit of April 10 and the recent
> commit, and I guess some of these commits introduce the bugs shown above. I
> could perform a manual search to locate the commits, but it is not always
> easy to build Hive using a particular commit (mostly because of version
> conflicts with Tez, Hadoop, Guava, or others).
>
> If anybody can suggest specific commits to test, or guide me in
> bug-hunting, please let me know. There is another bug that shows itself
> when computing column statistics, but we can suppress it by using default
> values in tez-site.xml.
>
> Thanks,
>
> --- Sungwoo
>

Reply via email to