[ 
https://issues.apache.org/jira/browse/FLINK-28613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Clive Wong resolved FLINK-28613.
--------------------------------
    Resolution: Workaround

> PyFlink 1.15 unable to start in Application Mode in k8s
> -------------------------------------------------------
>
>                 Key: FLINK-28613
>                 URL: https://issues.apache.org/jira/browse/FLINK-28613
>             Project: Flink
>          Issue Type: Bug
>          Components: Client / Job Submission
>    Affects Versions: 1.15.1
>            Reporter: Clive Wong
>            Priority: Major
>
> I recently bumped my PyFlink job from 1.14 to 1.15, and the job is failing 
> with build 1.15 in k8s.
> The error is due to NetUtils not able to getAvailablePort. I suspect this is 
> related to the version bump of py4j from 0.10.8.1 to 0.10.9.3 in required by 
> apache-flink 1.15 in python.
> The error stack is:
> {code:java}
> 2022-07-19 11:17:06,225 INFO  
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] 
> - Start SessionDispatcherLeaderProcess.
> 2022-07-19 11:17:06,226 INFO  
> org.apache.flink.runtime.resourcemanager.ResourceManagerServiceImpl [] - 
> Starting resource manager service.
> 2022-07-19 11:17:06,227 INFO  
> org.apache.flink.runtime.resourcemanager.ResourceManagerServiceImpl [] - 
> Resource manager service is granted leadership with session id 
> 00000000-0000-0000-0000-000000000000.
> 2022-07-19 11:17:06,229 INFO  
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] 
> - Recover all persisted job graphs that are not finished, yet.
> 2022-07-19 11:17:06,229 INFO  
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] 
> - Successfully recovered 0 persisted job graphs.
> 2022-07-19 11:17:06,306 INFO  
> org.apache.flink.runtime.rpc.akka.AkkaRpcService             [] - Starting 
> RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at 
> akka://flink/user/rpc/dispatcher_0 .
> 2022-07-19 11:17:06,309 INFO  
> org.apache.flink.runtime.rpc.akka.AkkaRpcService             [] - Starting 
> RPC endpoint for 
> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at 
> akka://flink/user/rpc/resourcemanager_1 .
> 2022-07-19 11:17:06,317 INFO  
> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
> Starting the resource manager.
> 2022-07-19 11:17:06,401 INFO  org.apache.flink.client.ClientUtils             
>              [] - Starting program (detached: true)
> 2022-07-19 11:17:06,500 WARN  
> org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap 
> [] - Application failed unexpectedly: 
> java.util.concurrent.CompletionException: 
> org.apache.flink.client.deployment.application.ApplicationExecutionException: 
> Could not execute application.
>     at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
>  ~[?:?]
>     at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346)
>  ~[?:?]
>     at 
> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1063)
>  ~[?:?]
>     at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
>  ~[?:?]
>     at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
>  ~[?:?]
>     at 
> org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:323)
>  ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.lambda$runApplicationAsync$2(ApplicationDispatcherBootstrap.java:244)
>  ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
>     at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
>     at 
> org.apache.flink.runtime.concurrent.akka.ActorSystemScheduledExecutorAdapter$ScheduledFutureTask.run(ActorSystemScheduledExecutorAdapter.java:171)
>  ~[flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1]
>     at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
>  ~[flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1]
>     at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$withContextClassLoader$0(ClassLoadingUtils.java:41)
>  ~[flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1]
>     at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:49) 
> [flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1]
>     at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:48)
>  [flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1]
>     at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) [?:?]
>     at 
> java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
>  [?:?]
>     at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) [?:?]
>     at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) 
> [?:?]
>     at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) 
> [?:?]
> Caused by: 
> org.apache.flink.client.deployment.application.ApplicationExecutionException: 
> Could not execute application.
>     ... 14 more
> Caused by: org.apache.flink.client.program.ProgramInvocationException: The 
> main method caused an error: java.lang.RuntimeException: Could not find a 
> free permitted port on the machine.
>     at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)
>  ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
>  ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) 
> ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:291)
>  ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     ... 13 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: Could not find a free permitted port on the 
> machine.
>     at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) 
> ~[?:?]
>     at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) ~[?:?]
>     at 
> org.apache.flink.client.python.PythonEnvUtils.startGatewayServer(PythonEnvUtils.java:387)
>  ~[?:?]
>     at org.apache.flink.client.python.PythonDriver.main(PythonDriver.java:75) 
> ~[?:?]
>     at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:?]
>     at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  ~[?:?]
>     at 
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:?]
>     at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
>     at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
>  ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
>  ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) 
> ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:291)
>  ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     ... 13 more
> Caused by: java.lang.RuntimeException: Could not find a free permitted port 
> on the machine.
>     at org.apache.flink.util.NetUtils.getAvailablePort(NetUtils.java:177) 
> ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> org.apache.flink.client.python.PythonEnvUtils.lambda$startGatewayServer$3(PythonEnvUtils.java:365)
>  ~[?:?]
>     at java.lang.Thread.run(Thread.java:834) ~[?:?]
> 2022-07-19 11:17:06,505 ERROR 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Fatal error 
> occurred in the cluster entrypoint.
> java.util.concurrent.CompletionException: 
> org.apache.flink.client.deployment.application.ApplicationExecutionException: 
> Could not execute application.
>     at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
>  ~[?:?]
>     at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346)
>  ~[?:?]
>     at 
> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1063)
>  ~[?:?]
>     at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
>  ~[?:?]
>     at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
>  ~[?:?]
>     at 
> org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:323)
>  ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.lambda$runApplicationAsync$2(ApplicationDispatcherBootstrap.java:244)
>  ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
>     at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
>     at 
> org.apache.flink.runtime.concurrent.akka.ActorSystemScheduledExecutorAdapter$ScheduledFutureTask.run(ActorSystemScheduledExecutorAdapter.java:171)
>  ~[flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1]
>     at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
>  ~[flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1]
>     at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$withContextClassLoader$0(ClassLoadingUtils.java:41)
>  ~[flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1]
>     at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:49) 
> [flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1]
>     at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:48)
>  [flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1]
>     at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) [?:?]
>     at 
> java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
>  [?:?]
>     at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) [?:?]
>     at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) 
> [?:?]
>     at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) 
> [?:?]
> Caused by: 
> org.apache.flink.client.deployment.application.ApplicationExecutionException: 
> Could not execute application.
>     ... 14 more
> Caused by: org.apache.flink.client.program.ProgramInvocationException: The 
> main method caused an error: java.lang.RuntimeException: Could not find a 
> free permitted port on the machine.
>     at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)
>  ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
>  ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) 
> ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:291)
>  ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     ... 13 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: Could not find a free permitted port on the 
> machine.
>     at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) 
> ~[?:?]
>     at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) ~[?:?]
>     at 
> org.apache.flink.client.python.PythonEnvUtils.startGatewayServer(PythonEnvUtils.java:387)
>  ~[?:?]
>     at org.apache.flink.client.python.PythonDriver.main(PythonDriver.java:75) 
> ~[?:?]
>     at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:?]
>     at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  ~[?:?]
>     at 
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:?]
>     at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
>     at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
>  ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
>  ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) 
> ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:291)
>  ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     ... 13 more
> Caused by: java.lang.RuntimeException: Could not find a free permitted port 
> on the machine.
>     at org.apache.flink.util.NetUtils.getAvailablePort(NetUtils.java:177) 
> ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> org.apache.flink.client.python.PythonEnvUtils.lambda$startGatewayServer$3(PythonEnvUtils.java:365)
>  ~[?:?]
>     at java.lang.Thread.run(Thread.java:834) ~[?:?]
> 2022-07-19 11:17:06,508 INFO  
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Shutting 
> StandaloneApplicationClusterEntryPoint down with application status UNKNOWN. 
> Diagnostics Cluster entrypoint has been closed externally..
> 2022-07-19 11:17:06,509 INFO  org.apache.flink.runtime.blob.BlobServer        
>              [] - Stopped BLOB server at 0.0.0.0:6124 {code}
> It's the same with Python3.7 & Python3.8



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to