Re: Cannot submit jobs on a HA Standalone JobManager

Julio Biason Wed, 02 May 2018 09:52:01 -0700

Hey guys and gals,

So, after a bit more digging, I found out that once HA is enabled,
`jobmanager.rpc.port` is also ignore (along with `jobmanager.rpc.address`,
but I was expecting this). Because I set the
`high-availability.jobmanager.port` to `50010-50015`, my RPC port also
changed (the docs made me think this would only affect the HA
communication, not ALL communications). This can be checked on the
Dashboard, under the JobManager configuration option.


But that didn't solve my problem. So far, the `flink run` still fails with
the same message (I'm adding the full stacktrace of the failure in the end,
just in case), but now I'm also seeing this message in the JobManager logs:

2018-05-02 16:44:32,373 WARN
org.apache.flink.runtime.jobmanager.JobManager                - Discard
message
LeaderSessionMessage(00000000-0000-0000-0000-000000000000,SubmitJob(JobGraph(jobId:
42a25752ab085117a21c02d3db54777e),DETACHED)) because the expected leader
session ID c01eba4f-44e2-4c65-85d5-a9a05ceba28e did not equal the received
leader session ID
00000000-0000-0000-0000-000000000000.


So, I'm still lost on where to go forward.


Failure when using `flink run`:

org.apache.flink.client.program.ProgramInvocationException: The program
execution failed: JobManager did not respond within 60000
ms

        at
org.apache.flink.client.program.ClusterClient.runDetached(ClusterClient.java:524)
        at
org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:103)
        at
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:456)
        at
org.apache.flink.client.program.DetachedEnvironment.finalizeExecute(DetachedEnvironment.java:77)
        at
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:402)
        at
org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:802)
        at org.apache.flink.client.CliFrontend.run(CliFrontend.java:282)
        at
org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1054)
        at org.apache.flink.client.CliFrontend$1.call(CliFrontend.java:1101)
        at org.apache.flink.client.CliFrontend$1.call(CliFrontend.java:1098)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
        at
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
        at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1098)
Caused by: org.apache.flink.runtime.client.JobTimeoutException: JobManager
did not respond within 60000 ms
        at
org.apache.flink.runtime.client.JobClient.submitJobDetached(JobClient.java:437)
        at
org.apache.flink.client.program.ClusterClient.runDetached(ClusterClient.java:516)
        ... 14 more
Caused by: java.util.concurrent.TimeoutException
        at
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
        at
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
        at
org.apache.flink.runtime.client.JobClient.submitJobDetached(JobClient.java:435)
        ... 15 more


On Wed, May 2, 2018 at 9:52 AM, Julio Biason <[email protected]> wrote:

> Hello all,
>
> I'm building a standalone cluster with HA JobManager. So far, everything
> seems to work, but when i try to `flink run` my job, it fails with the
> following error:
>
> Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException:
> Could not retrieve the leader gateway.
>
> So far, I have two different machines running the JobManager and, looking
> at the logs, I can't see any problem whatsoever to explain why the flink
> command is refusing to run the pipeline...
>
> Any ideas where I should look?
>
> --
> *Julio Biason*, Sofware Engineer
> *AZION*  |  Deliver. Accelerate. Protect.
> Office: +55 51 3083 8101 <callto:+555130838101>  |  Mobile: +55 51
> <callto:+5551996209291>*99907 0554*
>



-- 
*Julio Biason*, Sofware Engineer
*AZION*  |  Deliver. Accelerate. Protect.
Office: +55 51 3083 8101 <callto:+555130838101>  |  Mobile: +55 51
<callto:+5551996209291>*99907 0554*

Re: Cannot submit jobs on a HA Standalone JobManager

Reply via email to