[GitHub] zeppelin pull request #2637: Add support to run Spark interpreter on a Kuber...

matyix Tue, 31 Oct 2017 08:22:02 -0700

GitHub user matyix opened a pull request:

    https://github.com/apache/zeppelin/pull/2637


    Add support to run Spark interpreter on a Kubernetes cluster

    ### What is this PR for?
    
    The goal of this PR is to be able to execute Spark notebooks on Kubernetes 
in cluster mode, so that the Spark Driver runs inside Kubernetes cluster - 
based on https://github.com/apache-spark-on-k8s/spark. Zeppelin uses 
`spark-submit` to start RemoteInterpreterServer which is able to execute 
notebooks on Spark. Kubernetes specific `spark-submit` parameters like driver, 
executor, init container, shuffle images should be set  in SPARK_SUBMIT_OPTIONS 
environment variable. In case the Spark interpreter is configured with a K8 
Spark specific master url (k8s://https....) RemoteInterpreterServer is launched 
inside a Spark driver pod on Kubernetes, thus Zeppelin server it has to be able 
to connect to the remote server. In a Kubernetes cluster the best solution for 
this is creating a K8S service for RemoteInterpreterServer. This is the reason 
for having the SparkK8RemoteInterpreterManagerProcess - extending functionality 
of RemoteInterpreterManagerProcess - which creates the Kubernetes ser
 vice, mapping the port of  RemoteInterpreterServer in Driver pod and connects 
to this service once Spark Driver pod is in Running state.
    
    ### What type of PR is it?
    Feature
    
    ### What is the Jira issue?
    * https://issues.apache.org/jira/browse/ZEPPELIN-3020
    
    ### How should this be tested?
    Unit and functional tests - running notebooks on Spark on K8S.
    
    ### Questions:
    * Does the licenses files need update?
    * Is there breaking changes for older versions?
    * Does this needs documentation?


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/banzaicloud/zeppelin spark-interpreter-k8s

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zeppelin/pull/2637.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2637
    
----
commit 05fc456558fe0c908108c1a3e1f6996429e0f849
Author: Janos Matyas <janos.mat...@gmail.com>
Date:   2017-10-28T18:25:09Z

    add ability to run Spark on Kubernetes cluster

----


---

[GitHub] zeppelin pull request #2637: Add support to run Spark interpreter on a Kuber...

Reply via email to