[jira] [Commented] (HIVE-16484) Investigate SparkLauncher for HoS as alternative to bin/spark-submit

Rui Li (JIRA) Mon, 08 Jan 2018 20:45:43 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16317692#comment-16317692
 ]


Rui Li commented on HIVE-16484:
-------------------------------

bq. Hive wouldn't need a separate Spark installation to be able to launch Spark 
apps. It could ship with everything ready to run HoS out of the box.
Yeah I also believe that's the main benefit. But if SparkLauncher cannot give 
us that, why don't we just use {{InProcessLauncher}}?

Regarding the extra connection, I'm not sure how it impacts us 
performance-wise. My main concern is it brings extra chance of issues while the 
benefits are not quite clear. For example, we had several connection timeout 
issues with the RPC framework. And seems {{LauncherServer}}/{{LauncherBackend}} 
have very similar configs to tweak, like 
{{spark.launcher.childConnectionTimeout}}.

Regarding debug, I assume it's mainly for yarn-client mode right? Because the 
process we launched in yarn-cluster mode is only a light-weight client talking 
to RM. And by deault it exits once the app starts running(HIVE-13895). I agree 
it makes debugging easier, but again that require InProcessLauncher.

So my suggestion is we wait until InProcessLauncher is released and implement 
another SparkClient using it. We can decide whether to get rid of the current 
SparkClientImpl when InProcessLauncher is mature. Does that make sense?

BTW, is there any docs about the SparkLauncher implementation? I just want to 
have a better understanding about it.

> Investigate SparkLauncher for HoS as alternative to bin/spark-submit
> --------------------------------------------------------------------
>
>                 Key: HIVE-16484
>                 URL: https://issues.apache.org/jira/browse/HIVE-16484
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>         Attachments: HIVE-16484.1.patch, HIVE-16484.10.patch, 
> HIVE-16484.2.patch, HIVE-16484.3.patch, HIVE-16484.4.patch, 
> HIVE-16484.5.patch, HIVE-16484.6.patch, HIVE-16484.7.patch, 
> HIVE-16484.8.patch, HIVE-16484.9.patch
>
>
> The {{SparkClientImpl#startDriver}} currently looks for the {{SPARK_HOME}} 
> directory and invokes the {{bin/spark-submit}} script, which spawns a 
> separate process to run the Spark application.
> {{SparkLauncher}} was added in SPARK-4924 and is a programatic way to launch 
> Spark applications.
> I see a few advantages:
> * No need to spawn a separate process to launch a HoS --> lower startup time
> * Simplifies the code in {{SparkClientImpl}} --> easier to debug
> * {{SparkLauncher#startApplication}} returns a {{SparkAppHandle}} which 
> contains some useful utilities for querying the state of the Spark job
> ** It also allows the launcher to specify a list of job listeners



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16484) Investigate SparkLauncher for HoS as alternative to bin/spark-submit

Reply via email to