[jira] [Commented] (HIVE-7436) Load Spark configuration into Hive driver

Chengxiang Li (JIRA) Sun, 20 Jul 2014 23:33:13 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-7436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068253#comment-14068253
 ]


Chengxiang Li commented on HIVE-7436:
-------------------------------------

[~xuefuz] Thanks for the comments. For the first question, default 
master/appname value should bed added in case of missing spark-defaults.conf, 
i'll update patch later.
{quote}
Second question: would user be able to set or change the spark configuration 
via hive's set command? I guess not, but I'd like to hear your thought.
{quote}
Here are some thoughts about this:
# Spark configurations is configured at application level, which means user can 
not reset spark configurations dynamically during spark application. (Spark 
application lifecycle is roughly same as the lifecycle of SparkContext instance)
# Change spark configuration via hive set command, means that Spark jobs which 
are represent of different hive query commands must be submitted through 
different spark applications.
# Currently hive driver run queries in same Spark application(singleton 
SparkClient=>singleton SparkContext).

So mostly this question is depends on another one: should hive driver submit 
queries in a singleton Spark application, or create separate Spark application 
for each query?
# For singleton spark application: little submit cost, fixed cluster resource 
in whole hive driver lifecycle.
# For separate spark application on each query: more submit cost(config 
loading, dependencies transformation, cluster resource allocation), dynamic 
resource application for each query.

Shark use singleton spark application, so its not resource efficient as it can 
not dynamicly adjust assigned resources as required. What do you think about 
this?

> Load Spark configuration into Hive driver
> -----------------------------------------
>
>                 Key: HIVE-7436
>                 URL: https://issues.apache.org/jira/browse/HIVE-7436
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Chengxiang Li
>            Assignee: Chengxiang Li
>         Attachments: HIVE-7436-Spark.1.patch
>
>
> load Spark configuration into Hive driver, there are 3 ways to setup spark 
> configurations:
> #  Configure properties in spark configuration file(spark-defaults.conf).
> #  Java property.
> #  System environment.
> Spark support configuration through system environment just for compatible 
> with previous scripts, we won't support in Hive on Spark. Hive on Spark load 
> defaults from java properties, then load properties from configuration file, 
> and override existed properties.
> configuration steps:
> # Create spark-defaults.conf, and place it in the /etc/spark/conf 
> configuration directory.
>     please refer to [http://spark.apache.org/docs/latest/configuration.html] 
> for configuration of spark-defaults.conf.
> # Create the $SPARK_CONF_DIR environment variable and set it to the location 
> of spark-defaults.conf.
>     export SPARK_CONF_DIR=/etc/spark/conf
> # Add $SAPRK_CONF_DIR to the $HADOOP_CLASSPATH environment variable.
>     export HADOOP_CLASSPATH=$SPARK_CONF_DIR:$HADOOP_CLASSPATH
> NO PRECOMMIT TESTS. This is for spark-branch only.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7436) Load Spark configuration into Hive driver

Reply via email to