Janos Matyas created ZEPPELIN-3020: -------------------------------------- Summary: Add support to run Spark interpreter on a Kubernetes cluster Key: ZEPPELIN-3020 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3020 Project: Zeppelin Issue Type: New Feature Reporter: Janos Matyas
The goal of this PR is to be able to execute Spark notebooks on Kubernetes in cluster mode, so that the Spark Driver runs inside Kubernetes cluster - based on https://github.com/apache-spark-on-k8s/spark. Zeppelin uses `spark-submit` to start RemoteInterpreterServer which is able to execute notebooks on Spark. Kubernetes specific `spark-submit` parameters like driver, executor, init container, shuffle images should be set in SPARK_SUBMIT_OPTIONS environment variable. In case the Spark interpreter is configured with a K8 Spark specific master url (k8s://https....) RemoteInterpreterServer is launched inside a Spark driver pod on Kubernetes, thus Zeppelin server it has to be able to connect to the remote server. In a Kubernetes cluster the best solution for this is creating a K8S service for RemoteInterpreterServer. This is the reason for having the SparkK8RemoteInterpreterManagerProcess - extending functionality of RemoteInterpreterManagerProcess - which creates the Kubernetes service, mapping the port of RemoteInterpreterServer in Driver pod and connects to this service once Spark Driver pod is in Running state. -- This message was sent by Atlassian JIRA (v6.4.14#64029)