[ https://issues.apache.org/jira/browse/HIVE-7958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14122192#comment-14122192 ]
Hive QA commented on HIVE-7958: ------------------------------- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12666605/HIVE-7958-spark.patch {color:red}ERROR:{color} -1 due to 19 failed/errored test(s), 6291 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_stats_orc org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_10 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_15 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_16 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_18 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_19 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_20 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_21 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_24 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_25 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_4 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_5 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_7 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_8 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_9 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/112/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/112/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-112/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 19 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12666605 > SparkWork generated by SparkCompiler may require multiple Spark jobs to run > --------------------------------------------------------------------------- > > Key: HIVE-7958 > URL: https://issues.apache.org/jira/browse/HIVE-7958 > Project: Hive > Issue Type: Bug > Components: Spark > Reporter: Xuefu Zhang > Assignee: Xuefu Zhang > Priority: Critical > Labels: Spark-M1 > Attachments: HIVE-7958-spark.patch > > > A SparkWork instance currently may contain disjointed work graphs. For > instance, union_remove_1.q may generated a plan like this: > {code} > Reduce2 <- Map 1 > Reduce4 <- Map 3 > {code} > The SparkPlan instance generated from this work graph contains two result > RDDs. When such plan is executed, we call .foreach() on the two RDDs > sequentially, which results two Spark jobs, one after the other. > While this works functionally, the performance will not be great as the Spark > jobs are run sequentially rather than concurrently. > Another side effect of this is that the corresponding SparkPlan instance is > over-complicated. > The are two potential approaches: > 1. Let SparkCompiler generate a work that can be executed in ONE Spark job > only. In above example, two Spark task should be generated. > 2. Let SparkPlanGenerate generate multiple Spark plans and then SparkClient > executes them concurrently. > Approach #1 seems more reasonable and naturally fit to our architecture. > Also, Hive's task execution framework already takes care of the task > concurrency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)