[jira] [Commented] (HIVE-7958) SparkWork generated by SparkCompiler may require multiple Spark jobs to run

Hive QA (JIRA) Thu, 04 Sep 2014 16:51:42 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-7958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14122192#comment-14122192
 ]


Hive QA commented on HIVE-7958:
-------------------------------



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12666605/HIVE-7958-spark.patch

{color:red}ERROR:{color} -1 due to 19 failed/errored test(s), 6291 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_stats_orc
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_10
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_15
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_16
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_18
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_19
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_20
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_21
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_24
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_25
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_4
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_5
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_7
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_8
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_9
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/112/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/112/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-112/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 19 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12666605

> SparkWork generated by SparkCompiler may require multiple Spark jobs to run
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-7958
>                 URL: https://issues.apache.org/jira/browse/HIVE-7958
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Xuefu Zhang
>            Priority: Critical
>              Labels: Spark-M1
>         Attachments: HIVE-7958-spark.patch
>
>
> A SparkWork instance currently may contain disjointed work graphs. For 
> instance, union_remove_1.q may generated a plan like this:
> {code}
> Reduce2 <- Map 1
> Reduce4 <- Map 3
> {code}
> The SparkPlan instance generated from this work graph contains two result 
> RDDs. When such plan is executed, we call .foreach() on the two RDDs 
> sequentially, which results two Spark jobs, one after the other.
> While this works functionally, the performance will not be great as the Spark 
> jobs are run sequentially rather than concurrently.
> Another side effect of this is that the corresponding SparkPlan instance is 
> over-complicated.
> The are two potential approaches:
> 1. Let SparkCompiler generate a work that can be executed in ONE Spark job 
> only. In above example, two Spark task should be generated.
> 2. Let SparkPlanGenerate generate multiple Spark plans and then SparkClient 
> executes them concurrently.
> Approach #1 seems more reasonable and naturally fit to our architecture. 
> Also, Hive's task execution framework already takes care of the task 
> concurrency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7958) SparkWork generated by SparkCompiler may require multiple Spark jobs to run

Reply via email to