[jira] [Commented] (HIVE-7382) Create a MiniSparkCluster and set up a testing framework [Spark Branch]

Rui Li (JIRA) Thu, 25 Sep 2014 06:13:56 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-7382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147714#comment-14147714
 ]


Rui Li commented on HIVE-7382:
------------------------------

Hi [~xuefuz],

I hit the same problem as Szehon mentioned.

After some digging, I think this is because in local-cluster mode spark will 
launch separate JVMs for executor backends. So it needs to run some scripts to 
determine proper class path (and probably something else), please refer to 
{{CommandUtils.buildCommandSeq}}, which is called when {{ExecutorRunner}} tries 
to launch the executor backend.
Therefore local-cluster mode requires an installation of spark, and spark.home 
or spark.test.home to be properly set. I think this is all right if 
local-cluster is merely used for spark unit tests. But it shouldn't be used for 
user applications, because it's not that "local" in the sense it requires an 
installation of spark.

To verify my guess, I run some hive query (not tests) on spark without setting 
spark.home. It runs well on standalone and local modes, but got the same error 
with local-cluster mode.
To make it work, I have to export SPARK_HOME properly. (Please note setting 
spark.home or spark.testing + spark.test.home in SparkConf won't help)

What's your opinion?

> Create a MiniSparkCluster and set up a testing framework [Spark Branch]
> -----------------------------------------------------------------------
>
>                 Key: HIVE-7382
>                 URL: https://issues.apache.org/jira/browse/HIVE-7382
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Rui Li
>              Labels: Spark-M1
>
> To automatically test Hive functionality over Spark execution engine, we need 
> to create a test framework that can execute Hive queries with Spark as the 
> backend. For that, we should create a MiniSparkCluser for this, similar to 
> other execution engines.
> Spark has a way to create a local cluster with a few processes in the local 
> machine, each process is a work node. It's fairly close to a real Spark 
> cluster. Our mini cluster can be based on that.
> For more info, please refer to the design doc on wiki.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7382) Create a MiniSparkCluster and set up a testing framework [Spark Branch]

Reply via email to