[ 
https://issues.apache.org/jira/browse/FLINK-28613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568685#comment-17568685
 ] 

Clive Wong commented on FLINK-28613:
------------------------------------

Turns out it's because of this change:
[https://github.com/apache/flink/blob/adbf09fb941c8f793df6d322ed95df87bc4254f3/flink-core/src/main/java/org/apache/flink/util/NetUtils.java#L166]

that attempts to write to a path that flink doesn't have access to. We fixed it 
by chmod the path it tries to write (flink bin path) in the container.

I'd recommend giving option as an env variable so that FileLock can be created 
a different directory.

> PyFlink 1.15 unable to start in Application Mode in k8s
> -------------------------------------------------------
>
>                 Key: FLINK-28613
>                 URL: https://issues.apache.org/jira/browse/FLINK-28613
>             Project: Flink
>          Issue Type: Bug
>          Components: Client / Job Submission
>    Affects Versions: 1.15.1
>            Reporter: Clive Wong
>            Priority: Major
>
> I recently bumped my PyFlink job from 1.14 to 1.15, and the job is failing 
> with build 1.15 in k8s.
> The error is due to NetUtils not able to getAvailablePort. I suspect this is 
> related to the version bump of py4j from 0.10.8.1 to 0.10.9.3 in required by 
> apache-flink 1.15 in python.
> The error stack is:
> {code:java}
> 2022-07-19 11:17:06,225 INFO  
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] 
> - Start SessionDispatcherLeaderProcess.
> 2022-07-19 11:17:06,226 INFO  
> org.apache.flink.runtime.resourcemanager.ResourceManagerServiceImpl [] - 
> Starting resource manager service.
> 2022-07-19 11:17:06,227 INFO  
> org.apache.flink.runtime.resourcemanager.ResourceManagerServiceImpl [] - 
> Resource manager service is granted leadership with session id 
> 00000000-0000-0000-0000-000000000000.
> 2022-07-19 11:17:06,229 INFO  
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] 
> - Recover all persisted job graphs that are not finished, yet.
> 2022-07-19 11:17:06,229 INFO  
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] 
> - Successfully recovered 0 persisted job graphs.
> 2022-07-19 11:17:06,306 INFO  
> org.apache.flink.runtime.rpc.akka.AkkaRpcService             [] - Starting 
> RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at 
> akka://flink/user/rpc/dispatcher_0 .
> 2022-07-19 11:17:06,309 INFO  
> org.apache.flink.runtime.rpc.akka.AkkaRpcService             [] - Starting 
> RPC endpoint for 
> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at 
> akka://flink/user/rpc/resourcemanager_1 .
> 2022-07-19 11:17:06,317 INFO  
> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
> Starting the resource manager.
> 2022-07-19 11:17:06,401 INFO  org.apache.flink.client.ClientUtils             
>              [] - Starting program (detached: true)
> 2022-07-19 11:17:06,500 WARN  
> org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap 
> [] - Application failed unexpectedly: 
> java.util.concurrent.CompletionException: 
> org.apache.flink.client.deployment.application.ApplicationExecutionException: 
> Could not execute application.
>     at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
>  ~[?:?]
>     at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346)
>  ~[?:?]
>     at 
> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1063)
>  ~[?:?]
>     at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
>  ~[?:?]
>     at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
>  ~[?:?]
>     at 
> org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:323)
>  ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.lambda$runApplicationAsync$2(ApplicationDispatcherBootstrap.java:244)
>  ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
>     at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
>     at 
> org.apache.flink.runtime.concurrent.akka.ActorSystemScheduledExecutorAdapter$ScheduledFutureTask.run(ActorSystemScheduledExecutorAdapter.java:171)
>  ~[flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1]
>     at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
>  ~[flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1]
>     at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$withContextClassLoader$0(ClassLoadingUtils.java:41)
>  ~[flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1]
>     at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:49) 
> [flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1]
>     at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:48)
>  [flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1]
>     at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) [?:?]
>     at 
> java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
>  [?:?]
>     at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) [?:?]
>     at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) 
> [?:?]
>     at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) 
> [?:?]
> Caused by: 
> org.apache.flink.client.deployment.application.ApplicationExecutionException: 
> Could not execute application.
>     ... 14 more
> Caused by: org.apache.flink.client.program.ProgramInvocationException: The 
> main method caused an error: java.lang.RuntimeException: Could not find a 
> free permitted port on the machine.
>     at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)
>  ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
>  ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) 
> ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:291)
>  ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     ... 13 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: Could not find a free permitted port on the 
> machine.
>     at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) 
> ~[?:?]
>     at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) ~[?:?]
>     at 
> org.apache.flink.client.python.PythonEnvUtils.startGatewayServer(PythonEnvUtils.java:387)
>  ~[?:?]
>     at org.apache.flink.client.python.PythonDriver.main(PythonDriver.java:75) 
> ~[?:?]
>     at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:?]
>     at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  ~[?:?]
>     at 
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:?]
>     at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
>     at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
>  ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
>  ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) 
> ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:291)
>  ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     ... 13 more
> Caused by: java.lang.RuntimeException: Could not find a free permitted port 
> on the machine.
>     at org.apache.flink.util.NetUtils.getAvailablePort(NetUtils.java:177) 
> ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> org.apache.flink.client.python.PythonEnvUtils.lambda$startGatewayServer$3(PythonEnvUtils.java:365)
>  ~[?:?]
>     at java.lang.Thread.run(Thread.java:834) ~[?:?]
> 2022-07-19 11:17:06,505 ERROR 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Fatal error 
> occurred in the cluster entrypoint.
> java.util.concurrent.CompletionException: 
> org.apache.flink.client.deployment.application.ApplicationExecutionException: 
> Could not execute application.
>     at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
>  ~[?:?]
>     at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346)
>  ~[?:?]
>     at 
> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1063)
>  ~[?:?]
>     at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
>  ~[?:?]
>     at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
>  ~[?:?]
>     at 
> org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:323)
>  ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.lambda$runApplicationAsync$2(ApplicationDispatcherBootstrap.java:244)
>  ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
>     at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
>     at 
> org.apache.flink.runtime.concurrent.akka.ActorSystemScheduledExecutorAdapter$ScheduledFutureTask.run(ActorSystemScheduledExecutorAdapter.java:171)
>  ~[flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1]
>     at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
>  ~[flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1]
>     at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$withContextClassLoader$0(ClassLoadingUtils.java:41)
>  ~[flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1]
>     at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:49) 
> [flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1]
>     at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:48)
>  [flink-rpc-akka_73d9230b-9d22-4143-8bbc-2ab5d539166f.jar:1.15.0-stream1]
>     at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) [?:?]
>     at 
> java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
>  [?:?]
>     at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) [?:?]
>     at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) 
> [?:?]
>     at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) 
> [?:?]
> Caused by: 
> org.apache.flink.client.deployment.application.ApplicationExecutionException: 
> Could not execute application.
>     ... 14 more
> Caused by: org.apache.flink.client.program.ProgramInvocationException: The 
> main method caused an error: java.lang.RuntimeException: Could not find a 
> free permitted port on the machine.
>     at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)
>  ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
>  ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) 
> ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:291)
>  ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     ... 13 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: Could not find a free permitted port on the 
> machine.
>     at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) 
> ~[?:?]
>     at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) ~[?:?]
>     at 
> org.apache.flink.client.python.PythonEnvUtils.startGatewayServer(PythonEnvUtils.java:387)
>  ~[?:?]
>     at org.apache.flink.client.python.PythonDriver.main(PythonDriver.java:75) 
> ~[?:?]
>     at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:?]
>     at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  ~[?:?]
>     at 
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:?]
>     at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
>     at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
>  ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
>  ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) 
> ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:291)
>  ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     ... 13 more
> Caused by: java.lang.RuntimeException: Could not find a free permitted port 
> on the machine.
>     at org.apache.flink.util.NetUtils.getAvailablePort(NetUtils.java:177) 
> ~[flink-dist-1.15.0-stream1.jar:1.15.0-stream1]
>     at 
> org.apache.flink.client.python.PythonEnvUtils.lambda$startGatewayServer$3(PythonEnvUtils.java:365)
>  ~[?:?]
>     at java.lang.Thread.run(Thread.java:834) ~[?:?]
> 2022-07-19 11:17:06,508 INFO  
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Shutting 
> StandaloneApplicationClusterEntryPoint down with application status UNKNOWN. 
> Diagnostics Cluster entrypoint has been closed externally..
> 2022-07-19 11:17:06,509 INFO  org.apache.flink.runtime.blob.BlobServer        
>              [] - Stopped BLOB server at 0.0.0.0:6124 {code}
> It's the same with Python3.7 & Python3.8



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to