Hi Fabrizio,

We have two connections. First, the Zeppelin interpreter opens a connection to the Zeppelin server to register and to send back the interpreter output. The Zeppelin server is the CALLBACK_HOST and the PORT indicates where the Zeppelin server opened the Thrift service for the Zeppelin interpreter.

An important part of the registration is that the Zeppelin interpreter tells the Zeppelin server where the interpreter pod has an open Thrifts server port. This information can be found in the Zeppelin server log output. Be on the lookout for this message. https://github.com/apache/zeppelin/blob/master/zeppelin-plugins/launcher/k8s-standard/src/main/java/org/apache/zeppelin/interpreter/launcher/K8sRemoteInterpreterProcess.java#L483 Also note the function ZEPPELIN_K8S_PORTFORWARD, which should help your Zeppelin server to reach the Zeppelin interpreter in K8s.

> the 1st "spark-submit" in "cluster mode" is started from the client (in the zeppelin host, in our case), then the 2nd "spark-submit" in "client mode" is started by the "/opt/entrypoint.sh" script inside the standard spark docker image.

Are you sure you are using the K8s launcher? As you can see in this part of the code (https://github.com/apache/zeppelin/blob/2f55fe8ed277b28d71f858633f9c9d76fd18f0c3/zeppelin-plugins/launcher/k8s-standard/src/main/java/org/apache/zeppelin/interpreter/launcher/K8sRemoteInterpreterProcess.java#L411), Zeppelin always uses client mode.

The architecture is quite simple:

Zeppelin-Server -> Zeppelin-Interpreter (with Spark in client mode) on K8s -> x-Spark-executors (based on your config)

Best Regards
Philipp


Am 27.10.21 um 15:19 schrieb Fabrizio Fab:

Hi Philipp, okay, I realized just now of my HUGE misunderstanding !

The "double-spark-submit" patter is just the standard spark-on-k8s way of 
running spark applications in cluster mode:
the 1st "spark-submit" in "cluster mode" is started from the client (in the zeppelin host, in our case), then 
the 2nd "spark-submit" in "client mode" is started by the "/opt/entrypoint.sh" script inside the 
standard spark docker image.

At this point I can make a more precise question:

I see that the interpreter.sh starts the RemoteInterpreterServer with, in 
particular the following paramters: CALLBACK_HOST / PORT
They refers to the Zeppelin host and RPC port

Moreover, when the interpreter starts, it runs a Thrift server on some random 
port.

So, I ask: which communications are supposed to happen, in order to correctly 
set-up my firewall/routing rules ?

-1 Must the Zeppelin server connect to the Interpreter Thrift server ?
-2 Must the Interpreter Thrift server connect to the Zeppelin server?
-3 Both ?

- Which ports must the Zeppelin server/ The thrift server  find open on the 
other server ?

Thank you everybody!

Fabrizio




On 2021/10/26 11:40:24, Philipp Dallig <philipp.dal...@gmail.com> wrote:
Hi Fabrizio,

At the moment I think zeppelin does not support running spark jobs in
cluster mode. But in fact K8s mode simulates cluster mode. Because the
Zeppelin interpreter is already started as a pod in K8s, as a manual
Spark submit execution would do in cluster mode.

Spark-submit is called only once during the start of the Zeppelin
interpreter. You will find the call in these lines:
https://github.com/apache/zeppelin/blob/2f55fe8ed277b28d71f858633f9c9d76fd18f0c3/bin/interpreter.sh#L303-L305

Best Regards
Philipp


Am 25.10.21 um 21:58 schrieb Fabrizio Fab:
Dear All, I am struggling since more than a week on the following problem.
My Zeppelin Server is running outside the k8s cluster (there is a reason for 
this) and I am able to run Spark zeppelin notes in Client mode but not in 
Cluster mode.

I see that, at first, a pod for the interpreter (RemoteInterpreterServer) is 
created on the cluster by spark-submit from the Zeppelin host, with 
deployMode=cluster (and this happens without errors), then the interpreter 
itself runs another spark-submit  (this time from the Pod) with 
deployMode=client.

Exactly, the following is the command line submitted by the interpreter from 
its pod

/opt/spark/bin/spark-submit \
--conf spark.driver.bindAddress=<ip address of the interpreter pod> \
--deploy-mode client \
--properties-file /opt/spark/conf/spark.properties \
--class org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer \
spark-internal \
<ZEPPELIN_HOST> \
<ZEPPELIN_SERVER_RPC_PORT> \
<interpreter_name>-<user name>

At this point, the interpreter Pod remains in "Running" state, while the Zeppelin note 
remains in "Pending" forever.

The log of the Interpreter (level = DEBUG) at the end only says:
   INFO [2021-10-25 18:16:58,229] ({RemoteInterpreterServer-Thread} 
RemoteInterpreterServer.java[run]:194) Launching ThriftServer at <ip address of the 
interpreter pod>:<random port>
   INFO [2021-10-25 18:16:58,229] ({RegisterThread} 
RemoteInterpreterServer.java[run]:592) Start registration
   INFO [2021-10-25 18:16:58,332] ({RegisterThread} 
RemoteInterpreterServer.java[run]:606) Registering interpreter process
   INFO [2021-10-25 18:16:58,356] ({RegisterThread} 
RemoteInterpreterServer.java[run]:608) Registered interpreter process
   INFO [2021-10-25 18:16:58,356] ({RegisterThread} 
RemoteInterpreterServer.java[run]:629) Registration finished
(I replaced the true ip and port with a placeholder to make the log more clear 
for you)

I am stuck at this point....
Anyone can help me ? Thank you in advance. Fabrizio

Reply via email to