[GitHub] zeppelin issue #2637: Add support to run Spark interpreter on a Kubernetes c...

matyix Sun, 12 Nov 2017 03:29:50 -0800

Github user matyix commented on the issue:

    https://github.com/apache/zeppelin/pull/2637
  
    @echarles @zjffdu @felixcheung 
    
    It is absolutely makes sense to keep this PR and make further work with it. 
Just to re-empahsize, the goal is to enable Zeppelin to submit notebooks to a 
Kubernetes cluster invoking spark-submit in `cluster` deploy mode.
    
     Please find below a couple of advantages the cluster mode has comparing to 
the client mode:
    
      - currently the cluster client mode appears to have some problems - I 
faced exactly the same problems what you have  described in the 
[PR](https://github.com/apache-spark-on-k8s/spark/pull/456) when running 
multiple interpreters and Iâm not sure if
      whether these problems will be resolved and client mode will be supported 
(I have some PR's on Spark-k8s fork and will catch up with the folks regarding 
this topic)
      - in cluster mode you are running Zeppelin server and each 
`RemoteInterpreterServer` process (Spark Driver) is running in **separate** 
pods which fits better to Kubernetes best practices/patterns (instead of having 
one monolith RIS)
      - the latest Spark Driver creates a separate K8S Service to handle 
Executor --> Driver connections in cluster mode, which again fits better in 
Kubernetes best practices/patterns
      - this solution works regardless of Zeppelin server running in/outside of 
cluster if we add the option to set up authentication info for Zeppelin
      - it is using `spark-submit` and `interpreter.sh` and simplifies a bit 
the `spark-submit` command for K8S. Other than this the PR created  
`SparkK8RemoteInterpreterManagedProcess` to simplify connection to 
`RemoteInterpreterServer` in K8S clusters, so we are using K8S client to look 
up the Driver pod then create a K8S Service is bounded to 
`RemoteInterpreterServer` running inside Driver pod
      - overall this may seem a bit more complicated then client mode however 
it works better and fits better in Kubernetes cluster best practices/patterns
      - if you have some ideas about a better way to place this functionality 
in Zeppelin, please let me know 
    
    Overall this is a way better and cleaner approach which fits the K8S 
ecosystem and at the same time has no side-effect for those not willing to use 
K8S. 
    
    I will update the PR regardless to fix the merge conflicts and add some 
minor changes/improvements - I am using this PR extensively on a few large K8S 
clusters and it works/fits our needs on K8S and complies with the our K8S 
cluster standards/best practices.

---

[GitHub] zeppelin issue #2637: Add support to run Spark interpreter on a Kubernetes c...

Reply via email to