Hi Fabrizio,
We have two connections. First, the Zeppelin interpreter opens a
connection to the Zeppelin server to register and to send back the
interpreter output. The Zeppelin server is the CALLBACK_HOST and the
PORT indicates where the Zeppelin server opened the Thrift service for
the Zeppelin interpreter.
An important part of the registration is that the Zeppelin interpreter
tells the Zeppelin server where the interpreter pod has an open Thrifts
server port. This information can be found in the Zeppelin server log
output. Be on the lookout for this message.
https://github.com/apache/zeppelin/blob/master/zeppelin-plugins/launcher/k8s-standard/src/main/java/org/apache/zeppelin/interpreter/launcher/K8sRemoteInterpreterProcess.java#L483
Also note the function ZEPPELIN_K8S_PORTFORWARD, which should help your
Zeppelin server to reach the Zeppelin interpreter in K8s.
> the 1st "spark-submit" in "cluster mode" is started from the client
(in the zeppelin host, in our case), then the 2nd "spark-submit" in
"client mode" is started by the "/opt/entrypoint.sh" script inside the
standard spark docker image.
Are you sure you are using the K8s launcher? As you can see in this part
of the code
(https://github.com/apache/zeppelin/blob/2f55fe8ed277b28d71f858633f9c9d76fd18f0c3/zeppelin-plugins/launcher/k8s-standard/src/main/java/org/apache/zeppelin/interpreter/launcher/K8sRemoteInterpreterProcess.java#L411),
Zeppelin always uses client mode.
The architecture is quite simple:
Zeppelin-Server -> Zeppelin-Interpreter (with Spark in client mode) on
K8s -> x-Spark-executors (based on your config)
Best Regards
Philipp
Am 27.10.21 um 15:19 schrieb Fabrizio Fab:
Hi Philipp, okay, I realized just now of my HUGE misunderstanding !
The "double-spark-submit" patter is just the standard spark-on-k8s way of
running spark applications in cluster mode:
the 1st "spark-submit" in "cluster mode" is started from the client (in the zeppelin host, in our case), then
the 2nd "spark-submit" in "client mode" is started by the "/opt/entrypoint.sh" script inside the
standard spark docker image.
At this point I can make a more precise question:
I see that the interpreter.sh starts the RemoteInterpreterServer with, in
particular the following paramters: CALLBACK_HOST / PORT
They refers to the Zeppelin host and RPC port
Moreover, when the interpreter starts, it runs a Thrift server on some random
port.
So, I ask: which communications are supposed to happen, in order to correctly
set-up my firewall/routing rules ?
-1 Must the Zeppelin server connect to the Interpreter Thrift server ?
-2 Must the Interpreter Thrift server connect to the Zeppelin server?
-3 Both ?
- Which ports must the Zeppelin server/ The thrift server find open on the
other server ?
Thank you everybody!
Fabrizio
On 2021/10/26 11:40:24, Philipp Dallig <philipp.dal...@gmail.com> wrote:
Hi Fabrizio,
At the moment I think zeppelin does not support running spark jobs in
cluster mode. But in fact K8s mode simulates cluster mode. Because the
Zeppelin interpreter is already started as a pod in K8s, as a manual
Spark submit execution would do in cluster mode.
Spark-submit is called only once during the start of the Zeppelin
interpreter. You will find the call in these lines:
https://github.com/apache/zeppelin/blob/2f55fe8ed277b28d71f858633f9c9d76fd18f0c3/bin/interpreter.sh#L303-L305
Best Regards
Philipp
Am 25.10.21 um 21:58 schrieb Fabrizio Fab:
Dear All, I am struggling since more than a week on the following problem.
My Zeppelin Server is running outside the k8s cluster (there is a reason for
this) and I am able to run Spark zeppelin notes in Client mode but not in
Cluster mode.
I see that, at first, a pod for the interpreter (RemoteInterpreterServer) is
created on the cluster by spark-submit from the Zeppelin host, with
deployMode=cluster (and this happens without errors), then the interpreter
itself runs another spark-submit (this time from the Pod) with
deployMode=client.
Exactly, the following is the command line submitted by the interpreter from
its pod
/opt/spark/bin/spark-submit \
--conf spark.driver.bindAddress=<ip address of the interpreter pod> \
--deploy-mode client \
--properties-file /opt/spark/conf/spark.properties \
--class org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer \
spark-internal \
<ZEPPELIN_HOST> \
<ZEPPELIN_SERVER_RPC_PORT> \
<interpreter_name>-<user name>
At this point, the interpreter Pod remains in "Running" state, while the Zeppelin note
remains in "Pending" forever.
The log of the Interpreter (level = DEBUG) at the end only says:
INFO [2021-10-25 18:16:58,229] ({RemoteInterpreterServer-Thread}
RemoteInterpreterServer.java[run]:194) Launching ThriftServer at <ip address of the
interpreter pod>:<random port>
INFO [2021-10-25 18:16:58,229] ({RegisterThread}
RemoteInterpreterServer.java[run]:592) Start registration
INFO [2021-10-25 18:16:58,332] ({RegisterThread}
RemoteInterpreterServer.java[run]:606) Registering interpreter process
INFO [2021-10-25 18:16:58,356] ({RegisterThread}
RemoteInterpreterServer.java[run]:608) Registered interpreter process
INFO [2021-10-25 18:16:58,356] ({RegisterThread}
RemoteInterpreterServer.java[run]:629) Registration finished
(I replaced the true ip and port with a placeholder to make the log more clear
for you)
I am stuck at this point....
Anyone can help me ? Thank you in advance. Fabrizio