Okay, it appears to have resolved 10.43.0.1:30081 as the address of the
JobManager. Most likely, the container can not access this address. Can you
validate this from within the container?

If I understand the Flink documentation correctly, you should be able to
manually specify rest.address, rest.port for the JobManager address. If you
can manually figure out an address to the JobManager service, and pass it
to Flink, the submission should work.

On Wed, May 5, 2021 at 7:15 PM Robert Cullen <cinquate...@gmail.com> wrote:

> Thanks for the reply. Here is an updated exception with DEBUG on. It
> appears to be timing out:
>
> 2021-05-05 16:56:19,700 DEBUG 
> org.apache.flink.kubernetes.kubeclient.FlinkKubeClientFactory [] - Setting 
> namespace of Kubernetes client to cmdaa
> 2021-05-05 16:56:19,700 DEBUG 
> org.apache.flink.kubernetes.kubeclient.FlinkKubeClientFactory [] - Setting 
> max concurrent requests of Kubernetes client to 64
> 2021-05-05 16:56:20,176 INFO  
> org.apache.flink.kubernetes.KubernetesClusterDescriptor      [] - Retrieve 
> flink cluster flink-jobmanager successfully, JobManager Web Interface: 
> http://10.43.0.1:30081
> 2021-05-05 16:56:20,239 INFO  org.apache.flink.client.cli.CliFrontend         
>              [] - Waiting for response...
> 2021-05-05 17:02:09,605 ERROR org.apache.flink.client.cli.CliFrontend         
>              [] - Error while running the command.
> org.apache.flink.util.FlinkException: Failed to retrieve job list.
>         at 
> org.apache.flink.client.cli.CliFrontend.listJobs(CliFrontend.java:449) 
> ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.client.cli.CliFrontend.lambda$list$0(CliFrontend.java:430) 
> ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1002)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at org.apache.flink.client.cli.CliFrontend.list(CliFrontend.java:427) 
> ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1060) 
> ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132) 
> ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
>  [flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132) 
> [flink-dist_2.12-1.13.0.jar:1.13.0]
> Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: 
> Could not complete the operation. Number of retries has been exhausted.
>         at 
> org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$9(FutureUtils.java:386)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
>  ~[?:1.8.0_292]
>         at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
>  ~[?:1.8.0_292]
>         at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>  ~[?:1.8.0_292]
>         at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
>  ~[?:1.8.0_292]
>         at 
> org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$1(RestClient.java:430)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:570)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:549)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:263)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292]
> Caused by: java.util.concurrent.CompletionException: 
> org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException: 
> connection timed out: /10.43.0.1:30081
>         at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
>  ~[?:1.8.0_292]
>         at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
>  ~[?:1.8.0_292]
>         at 
> java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:957) 
> ~[?:1.8.0_292]
>         at 
> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:940)
>  ~[?:1.8.0_292]
>         at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>  ~[?:1.8.0_292]
>         at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
>  ~[?:1.8.0_292]
>         at 
> org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$1(RestClient.java:430)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:570)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:549)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:263)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292]
> Caused by: 
> org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException: 
> connection timed out: /10.43.0.1:30081
>         at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:261)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292]
> ngleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) 
> ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at 
> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>         at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292]
>
>
> On Wed, May 5, 2021 at 6:59 AM Robert Metzger <rmetz...@apache.org> wrote:
>
>> Hi,
>> can you check the client log in the "log/" directory?
>> The Flink client will try to access the K8s API server to retrieve the
>> endpoint of the jobmanager. For that, the pod needs to have permissions
>> (through a service account) to make such calls to K8s. My hope is that the
>> logs or previous messages are giving an indication into what Flink is
>> trying to do.
>> Can you also try running on DEBUG log level? (should be the
>> log4j-cli.properties file).
>>
>>
>>
>> On Tue, May 4, 2021 at 3:17 PM Robert Cullen <cinquate...@gmail.com>
>> wrote:
>>
>>> I have a flink cluster running in kubernetes, just the basic
>>> installation with one JobManager and two TaskManagers. I want to interact
>>> with it via command line from a separate container ie:
>>>
>>> root@flink-client:/opt/flink# ./bin/flink list --target 
>>> kubernetes-application -Dkubernetes.cluster-id=job-manager
>>>
>>> How do you interact in the same kubernetes instance via CLI (Not from
>>> the desktop)?  This is the exception:
>>>
>>> ------------------------------------------------------------
>>>  The program finished with the following exception:
>>>
>>> java.lang.RuntimeException: 
>>> org.apache.flink.client.deployment.ClusterRetrieveException: Could not get 
>>> the rest endpoint of job-manager
>>>         at 
>>> org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$0(KubernetesClusterDescriptor.java:103)
>>>         at 
>>> org.apache.flink.kubernetes.KubernetesClusterDescriptor.retrieve(KubernetesClusterDescriptor.java:145)
>>>         at 
>>> org.apache.flink.kubernetes.KubernetesClusterDescriptor.retrieve(KubernetesClusterDescriptor.java:67)
>>>         at 
>>> org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1001)
>>>         at 
>>> org.apache.flink.client.cli.CliFrontend.list(CliFrontend.java:427)
>>>         at 
>>> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1060)
>>>         at 
>>> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
>>>         at 
>>> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
>>>         at 
>>> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)
>>> Caused by: org.apache.flink.client.deployment.ClusterRetrieveException: 
>>> Could not get the rest endpoint of job-manager
>>>         ... 9 more
>>> root@flink-client:/opt/flink#
>>>
>>> --
>>> Robert Cullen
>>> 240-475-4490
>>>
>>
>
> --
> Robert Cullen
> 240-475-4490
>

Reply via email to