MalcolmSanders created FLINK-13660:
--------------------------------------

             Summary: Cannot submit job on Flink session cluster on kubernetes 
with multiple JM pods (zk HA) through web frontend
                 Key: FLINK-13660
                 URL: https://issues.apache.org/jira/browse/FLINK-13660
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Coordination, Runtime / Web Frontend
    Affects Versions: 1.9.0
            Reporter: MalcolmSanders


Hi, all,

Previously I'm testing HighAvailabilityService of Flink 1.9 on k8s. When 
testing Flink session cluster with 3 JM pods deployed on k8s, I find the jar I 
previously uploaded to the web frontend will continuously dispear in "Uploaded 
Jars" web page. As a result, it's hard to submit the job.

After investigation, I find that it has something to do with (1) the 
implementation of method "handleRequest" of "JarListHandler" and 
"JarUploadHandler" RestHandlers along with (2) the routing mechanism of k8s 
service.

(1) It seem to me that "handleRequest" method should dispatch the message 
through "DispatcherGateway gateway" to the leader JM. While the two RestHanders 
don't use the gateway and just do things locally. That is to say if a "upload 
jar" request or "list loaded jars" request is sent to any of the 3 JMs, the web 
frontend will only storage or fetch jars from local directory.

(2) I use k8s service to open a flink web page, the URL pattern is (PS: start 
"kubectl proxy" locally): 
http://127.0.0.1:8001/api/v1/namespaces/${my_ns}/services/${my_session_cluster_service}:ui/proxy/#/submit
Since there a 3 endpoints (3 JMs) of this k8s service, the k8s routing 
mechanism will randomly choose which endpoint (JM) a REST message sends to.

As a result of the two factors, Flink session cluster previously cannot be 
deployed with multiple JMs using HighAvailablityService on k8s.

Proposals:
(1) redirect jar related REST messages to the leader JM
(2) (along with proposal(1)) synchronize jar files with the standby JMs incase 
of standby JM taking the leadership
(3) support upload jars to global filesystem (etc. dfs)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to