[ https://issues.apache.org/jira/browse/FLINK-28613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Clive Wong updated FLINK-28613: ------------------------------- Description: I recently bumped my PyFlink job from 1.14 to 1.15, and the job is failing with build 1.15 in k8s. The error is due to NetUtils not able to getAvailablePort. I suspect this is related to the version bump of py4j from 0.10.8.1 to 0.10.9.3 in required by apache-flink 1.15 in python. The error stack is: {code:java} 2022-07-19 11:17:06,225 INFO org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - Start SessionDispatcherLeaderProcess. 2022-07-19 11:17:06,226 INFO org.apache.flink.runtime.resourcemanager.ResourceManagerServiceImpl [] - Starting resource manager service. 2022-07-19 11:17:06,227 INFO org.apache.flink.runtime.resourcemanager.ResourceManagerServiceImpl [] - Resource manager service is granted leadership with session id 00000000-0000-0000-0000-000000000000. 2022-07-19 11:17:06,229 INFO org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - Recover all persisted job graphs that are not finished, yet. 2022-07-19 11:17:06,229 INFO org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - Successfully recovered 0 persisted job graphs. 2022-07-19 11:17:06,306 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at akka://flink/user/rpc/dispatcher_0 . 2022-07-19 11:17:06,309 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting RPC endpoint for org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at akka://flink/user/rpc/resourcemanager_1 . 2022-07-19 11:17:06,317 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Starting the resource manager. 2022-07-19 11:17:06,401 INFO org.apache.flink.client.ClientUtils [] - Starting program (detached: true) 2022-07-19 11:17:06,500 WARN org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap [] - Application failed unexpectedly: java.util.concurrent.CompletionException: org.apache.flink.client.deployment.application.ApplicationExecutionException: Could not execute application. at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331) ~[?:?] at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346) ~[?:?] at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1063) ~[?:?] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) ~[?:?] at org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:323) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.lambda$runApplicationAsync$2(ApplicationDispatcherBootstrap.java:244) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?] at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at org.apache.flink.runtime.concurrent.akka.ActorSystemScheduledExecutorAdapter$ScheduledFutureTask.run(ActorSystemScheduledExecutorAdapter.java:171) ~[flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1] at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68) ~[flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1] at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$withContextClassLoader$0(ClassLoadingUtils.java:41) ~[flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1] at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:49) [flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1] at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:48) [flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1] at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) [?:?] at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) [?:?] at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) [?:?] at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) [?:?] at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) [?:?] Caused by: org.apache.flink.client.deployment.application.ApplicationExecutionException: Could not execute application. ... 14 more Caused by: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: java.lang.RuntimeException: Could not find a free permitted port on the machine. at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:291) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] ... 13 more Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Could not find a free permitted port on the machine. at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) ~[?:?] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) ~[?:?] at org.apache.flink.client.python.PythonEnvUtils.startGatewayServer(PythonEnvUtils.java:387) ~[?:?] at org.apache.flink.client.python.PythonDriver.main(PythonDriver.java:75) ~[?:?] at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?] at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?] at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?] at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?] at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:291) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] ... 13 more Caused by: java.lang.RuntimeException: Could not find a free permitted port on the machine. at org.apache.flink.util.NetUtils.getAvailablePort(NetUtils.java:177) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at org.apache.flink.client.python.PythonEnvUtils.lambda$startGatewayServer$3(PythonEnvUtils.java:365) ~[?:?] at java.lang.Thread.run(Thread.java:834) ~[?:?] 2022-07-19 11:17:06,505 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Fatal error occurred in the cluster entrypoint. java.util.concurrent.CompletionException: org.apache.flink.client.deployment.application.ApplicationExecutionException: Could not execute application. at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331) ~[?:?] at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346) ~[?:?] at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1063) ~[?:?] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) ~[?:?] at org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:323) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.lambda$runApplicationAsync$2(ApplicationDispatcherBootstrap.java:244) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?] at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at org.apache.flink.runtime.concurrent.akka.ActorSystemScheduledExecutorAdapter$ScheduledFutureTask.run(ActorSystemScheduledExecutorAdapter.java:171) ~[flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1] at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68) ~[flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1] at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$withContextClassLoader$0(ClassLoadingUtils.java:41) ~[flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1] at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:49) [flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1] at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:48) [flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1] at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) [?:?] at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) [?:?] at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) [?:?] at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) [?:?] at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) [?:?] Caused by: org.apache.flink.client.deployment.application.ApplicationExecutionException: Could not execute application. ... 14 more Caused by: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: java.lang.RuntimeException: Could not find a free permitted port on the machine. at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:291) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] ... 13 more Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Could not find a free permitted port on the machine. at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) ~[?:?] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) ~[?:?] at org.apache.flink.client.python.PythonEnvUtils.startGatewayServer(PythonEnvUtils.java:387) ~[?:?] at org.apache.flink.client.python.PythonDriver.main(PythonDriver.java:75) ~[?:?] at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?] at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?] at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?] at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?] at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:291) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] ... 13 more Caused by: java.lang.RuntimeException: Could not find a free permitted port on the machine. at org.apache.flink.util.NetUtils.getAvailablePort(NetUtils.java:177) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at org.apache.flink.client.python.PythonEnvUtils.lambda$startGatewayServer$3(PythonEnvUtils.java:365) ~[?:?] at java.lang.Thread.run(Thread.java:834) ~[?:?] 2022-07-19 11:17:06,508 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Shutting StandaloneApplicationClusterEntryPoint down with application status UNKNOWN. Diagnostics Cluster entrypoint has been closed externally.. 2022-07-19 11:17:06,509 INFO org.apache.flink.runtime.blob.BlobServer [] - Stopped BLOB server at 0.0.0.0:6124 {code} It's the same with Python3.7 & Python3.8 was: I recently bumped my PyFlink job from 1.14 to 1.15, and the job is failing with build 1.15 in k8s. The error is due to NetUtils not able to getAvailablePort. I suspect this is related to the version bump of py4j from 0.10.8.1 to 0.10.9.3 in required by apache-flink 1.15 in python. The error stack is: {code:java} 2022-07-19 11:17:06,225 INFO org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - Start SessionDispatcherLeaderProcess. 2022-07-19 11:17:06,226 INFO org.apache.flink.runtime.resourcemanager.ResourceManagerServiceImpl [] - Starting resource manager service. 2022-07-19 11:17:06,227 INFO org.apache.flink.runtime.resourcemanager.ResourceManagerServiceImpl [] - Resource manager service is granted leadership with session id 00000000-0000-0000-0000-000000000000. 2022-07-19 11:17:06,229 INFO org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - Recover all persisted job graphs that are not finished, yet. 2022-07-19 11:17:06,229 INFO org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - Successfully recovered 0 persisted job graphs. 2022-07-19 11:17:06,306 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at akka://flink/user/rpc/dispatcher_0 . 2022-07-19 11:17:06,309 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting RPC endpoint for org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at akka://flink/user/rpc/resourcemanager_1 . 2022-07-19 11:17:06,317 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Starting the resource manager. 2022-07-19 11:17:06,401 INFO org.apache.flink.client.ClientUtils [] - Starting program (detached: true) 2022-07-19 11:17:06,500 WARN org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap [] - Application failed unexpectedly: java.util.concurrent.CompletionException: org.apache.flink.client.deployment.application.ApplicationExecutionException: Could not execute application. at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331) ~[?:?] at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346) ~[?:?] at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1063) ~[?:?] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) ~[?:?] at org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:323) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.lambda$runApplicationAsync$2(ApplicationDispatcherBootstrap.java:244) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?] at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at org.apache.flink.runtime.concurrent.akka.ActorSystemScheduledExecutorAdapter$ScheduledFutureTask.run(ActorSystemScheduledExecutorAdapter.java:171) ~[flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1] at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68) ~[flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1] at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$withContextClassLoader$0(ClassLoadingUtils.java:41) ~[flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1] at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:49) [flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1] at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:48) [flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1] at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) [?:?] at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) [?:?] at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) [?:?] at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) [?:?] at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) [?:?] Caused by: org.apache.flink.client.deployment.application.ApplicationExecutionException: Could not execute application. ... 14 more Caused by: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: java.lang.RuntimeException: Could not find a free permitted port on the machine. at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:291) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] ... 13 more Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Could not find a free permitted port on the machine. at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) ~[?:?] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) ~[?:?] at org.apache.flink.client.python.PythonEnvUtils.startGatewayServer(PythonEnvUtils.java:387) ~[?:?] at org.apache.flink.client.python.PythonDriver.main(PythonDriver.java:75) ~[?:?] at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?] at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?] at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?] at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?] at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:291) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] ... 13 more Caused by: java.lang.RuntimeException: Could not find a free permitted port on the machine. at org.apache.flink.util.NetUtils.getAvailablePort(NetUtils.java:177) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at org.apache.flink.client.python.PythonEnvUtils.lambda$startGatewayServer$3(PythonEnvUtils.java:365) ~[?:?] at java.lang.Thread.run(Thread.java:834) ~[?:?] 2022-07-19 11:17:06,505 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Fatal error occurred in the cluster entrypoint. java.util.concurrent.CompletionException: org.apache.flink.client.deployment.application.ApplicationExecutionException: Could not execute application. at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331) ~[?:?] at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346) ~[?:?] at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1063) ~[?:?] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) ~[?:?] at org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:323) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.lambda$runApplicationAsync$2(ApplicationDispatcherBootstrap.java:244) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?] at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at org.apache.flink.runtime.concurrent.akka.ActorSystemScheduledExecutorAdapter$ScheduledFutureTask.run(ActorSystemScheduledExecutorAdapter.java:171) ~[flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1] at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68) ~[flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1] at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$withContextClassLoader$0(ClassLoadingUtils.java:41) ~[flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1] at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:49) [flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1] at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:48) [flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1] at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) [?:?] at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) [?:?] at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) [?:?] at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) [?:?] at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) [?:?] Caused by: org.apache.flink.client.deployment.application.ApplicationExecutionException: Could not execute application. ... 14 more Caused by: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: java.lang.RuntimeException: Could not find a free permitted port on the machine. at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:291) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] ... 13 more Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Could not find a free permitted port on the machine. at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) ~[?:?] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) ~[?:?] at org.apache.flink.client.python.PythonEnvUtils.startGatewayServer(PythonEnvUtils.java:387) ~[?:?] at org.apache.flink.client.python.PythonDriver.main(PythonDriver.java:75) ~[?:?] at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?] at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?] at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?] at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?] at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:291) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] ... 13 more Caused by: java.lang.RuntimeException: Could not find a free permitted port on the machine. at org.apache.flink.util.NetUtils.getAvailablePort(NetUtils.java:177) ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] at org.apache.flink.client.python.PythonEnvUtils.lambda$startGatewayServer$3(PythonEnvUtils.java:365) ~[?:?] at java.lang.Thread.run(Thread.java:834) ~[?:?] 2022-07-19 11:17:06,508 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Shutting StandaloneApplicationClusterEntryPoint down with application status UNKNOWN. Diagnostics Cluster entrypoint has been closed externally.. 2022-07-19 11:17:06,509 INFO org.apache.flink.runtime.blob.BlobServer [] - Stopped BLOB server at 0.0.0.0:6124 {code} > PyFlink 1.15 unable to start in Application Mode in k8s > ------------------------------------------------------- > > Key: FLINK-28613 > URL: https://issues.apache.org/jira/browse/FLINK-28613 > Project: Flink > Issue Type: Bug > Components: Client / Job Submission > Affects Versions: 1.15.1 > Reporter: Clive Wong > Priority: Major > > I recently bumped my PyFlink job from 1.14 to 1.15, and the job is failing > with build 1.15 in k8s. > The error is due to NetUtils not able to getAvailablePort. I suspect this is > related to the version bump of py4j from 0.10.8.1 to 0.10.9.3 in required by > apache-flink 1.15 in python. > The error stack is: > {code:java} > 2022-07-19 11:17:06,225 INFO > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] > - Start SessionDispatcherLeaderProcess. > 2022-07-19 11:17:06,226 INFO > org.apache.flink.runtime.resourcemanager.ResourceManagerServiceImpl [] - > Starting resource manager service. > 2022-07-19 11:17:06,227 INFO > org.apache.flink.runtime.resourcemanager.ResourceManagerServiceImpl [] - > Resource manager service is granted leadership with session id > 00000000-0000-0000-0000-000000000000. > 2022-07-19 11:17:06,229 INFO > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] > - Recover all persisted job graphs that are not finished, yet. > 2022-07-19 11:17:06,229 INFO > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] > - Successfully recovered 0 persisted job graphs. > 2022-07-19 11:17:06,306 INFO > org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting > RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at > akka://flink/user/rpc/dispatcher_0 . > 2022-07-19 11:17:06,309 INFO > org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting > RPC endpoint for > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at > akka://flink/user/rpc/resourcemanager_1 . > 2022-07-19 11:17:06,317 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Starting the resource manager. > 2022-07-19 11:17:06,401 INFO org.apache.flink.client.ClientUtils > [] - Starting program (detached: true) > 2022-07-19 11:17:06,500 WARN > org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap > [] - Application failed unexpectedly: > java.util.concurrent.CompletionException: > org.apache.flink.client.deployment.application.ApplicationExecutionException: > Could not execute application. > at > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331) > ~[?:?] > at > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346) > ~[?:?] > at > java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1063) > ~[?:?] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > ~[?:?] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) > ~[?:?] > at > org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:323) > ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] > at > org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.lambda$runApplicationAsync$2(ApplicationDispatcherBootstrap.java:244) > ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?] > at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] > at > org.apache.flink.runtime.concurrent.akka.ActorSystemScheduledExecutorAdapter$ScheduledFutureTask.run(ActorSystemScheduledExecutorAdapter.java:171) > ~[flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1] > at > org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68) > ~[flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1] > at > org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$withContextClassLoader$0(ClassLoadingUtils.java:41) > ~[flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1] > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:49) > [flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1] > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:48) > [flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1] > at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) [?:?] > at > java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) > [?:?] > at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) [?:?] > at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) > [?:?] > at > java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) > [?:?] > Caused by: > org.apache.flink.client.deployment.application.ApplicationExecutionException: > Could not execute application. > ... 14 more > Caused by: org.apache.flink.client.program.ProgramInvocationException: The > main method caused an error: java.lang.RuntimeException: Could not find a > free permitted port on the machine. > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372) > ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222) > ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] > at > org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) > ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] > at > org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:291) > ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] > ... 13 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Could not find a free permitted port on the > machine. > at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) > ~[?:?] > at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) ~[?:?] > at > org.apache.flink.client.python.PythonEnvUtils.startGatewayServer(PythonEnvUtils.java:387) > ~[?:?] > at org.apache.flink.client.python.PythonDriver.main(PythonDriver.java:75) > ~[?:?] > at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:?] > at > jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:?] > at > jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:?] > at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?] > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355) > ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222) > ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] > at > org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) > ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] > at > org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:291) > ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] > ... 13 more > Caused by: java.lang.RuntimeException: Could not find a free permitted port > on the machine. > at org.apache.flink.util.NetUtils.getAvailablePort(NetUtils.java:177) > ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] > at > org.apache.flink.client.python.PythonEnvUtils.lambda$startGatewayServer$3(PythonEnvUtils.java:365) > ~[?:?] > at java.lang.Thread.run(Thread.java:834) ~[?:?] > 2022-07-19 11:17:06,505 ERROR > org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Fatal error > occurred in the cluster entrypoint. > java.util.concurrent.CompletionException: > org.apache.flink.client.deployment.application.ApplicationExecutionException: > Could not execute application. > at > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331) > ~[?:?] > at > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346) > ~[?:?] > at > java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1063) > ~[?:?] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > ~[?:?] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) > ~[?:?] > at > org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:323) > ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] > at > org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.lambda$runApplicationAsync$2(ApplicationDispatcherBootstrap.java:244) > ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?] > at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] > at > org.apache.flink.runtime.concurrent.akka.ActorSystemScheduledExecutorAdapter$ScheduledFutureTask.run(ActorSystemScheduledExecutorAdapter.java:171) > ~[flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1] > at > org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68) > ~[flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1] > at > org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$withContextClassLoader$0(ClassLoadingUtils.java:41) > ~[flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1] > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:49) > [flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1] > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:48) > [flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1] > at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) [?:?] > at > java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) > [?:?] > at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) [?:?] > at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) > [?:?] > at > java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) > [?:?] > Caused by: > org.apache.flink.client.deployment.application.ApplicationExecutionException: > Could not execute application. > ... 14 more > Caused by: org.apache.flink.client.program.ProgramInvocationException: The > main method caused an error: java.lang.RuntimeException: Could not find a > free permitted port on the machine. > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372) > ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222) > ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] > at > org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) > ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] > at > org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:291) > ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] > ... 13 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Could not find a free permitted port on the > machine. > at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) > ~[?:?] > at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) ~[?:?] > at > org.apache.flink.client.python.PythonEnvUtils.startGatewayServer(PythonEnvUtils.java:387) > ~[?:?] > at org.apache.flink.client.python.PythonDriver.main(PythonDriver.java:75) > ~[?:?] > at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:?] > at > jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:?] > at > jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:?] > at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?] > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355) > ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222) > ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] > at > org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) > ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] > at > org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:291) > ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] > ... 13 more > Caused by: java.lang.RuntimeException: Could not find a free permitted port > on the machine. > at org.apache.flink.util.NetUtils.getAvailablePort(NetUtils.java:177) > ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1] > at > org.apache.flink.client.python.PythonEnvUtils.lambda$startGatewayServer$3(PythonEnvUtils.java:365) > ~[?:?] > at java.lang.Thread.run(Thread.java:834) ~[?:?] > 2022-07-19 11:17:06,508 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Shutting > StandaloneApplicationClusterEntryPoint down with application status UNKNOWN. > Diagnostics Cluster entrypoint has been closed externally.. > 2022-07-19 11:17:06,509 INFO org.apache.flink.runtime.blob.BlobServer > [] - Stopped BLOB server at 0.0.0.0:6124 {code} > It's the same with Python3.7 & Python3.8 -- This message was sent by Atlassian Jira (v8.20.10#820010)