I resolved this by changing the jobmanager-rest-service.yaml (Changed type
to ClusterIP and removed nodePort

apiVersion: v1
kind: Service
metadata:
  name: flink-jobmanager-rest
spec:
  type: ClusterIP
  ports:
  - name: rest
    port: 8081
    targetPort: 8081
    #nodePort: 30081
  selector:
    app: flink
    component: jobmanager


On Wed, May 5, 2021 at 10:28 PM Yang Wang <danrtsey...@gmail.com> wrote:

> It seems that you are using the NodePort to expose the rest service. If
> you only want to access the Flink UI/rest in the K8s cluster,
> then I would suggest to set "kubernetes.rest-service.exposed.type" to
> "ClusterIP". Because we are using the K8s master node to
> construct the JobManager rest endpoint when using NodePort. Sometime, it
> is not accessible due to firewall.
>
> Best,
> Yang
>
> Robert Metzger <rmetz...@apache.org> 于2021年5月6日周四 上午2:08写道:
>
>> Okay, it appears to have resolved 10.43.0.1:30081 as the address of the
>> JobManager. Most likely, the container can not access this address. Can you
>> validate this from within the container?
>>
>> If I understand the Flink documentation correctly, you should be able to
>> manually specify rest.address, rest.port for the JobManager address. If
>> you can manually figure out an address to the JobManager service, and pass
>> it to Flink, the submission should work.
>>
>> On Wed, May 5, 2021 at 7:15 PM Robert Cullen <cinquate...@gmail.com>
>> wrote:
>>
>>> Thanks for the reply. Here is an updated exception with DEBUG on. It
>>> appears to be timing out:
>>>
>>> 2021-05-05 16:56:19,700 DEBUG 
>>> org.apache.flink.kubernetes.kubeclient.FlinkKubeClientFactory [] - Setting 
>>> namespace of Kubernetes client to cmdaa
>>> 2021-05-05 16:56:19,700 DEBUG 
>>> org.apache.flink.kubernetes.kubeclient.FlinkKubeClientFactory [] - Setting 
>>> max concurrent requests of Kubernetes client to 64
>>> 2021-05-05 16:56:20,176 INFO  
>>> org.apache.flink.kubernetes.KubernetesClusterDescriptor      [] - Retrieve 
>>> flink cluster flink-jobmanager successfully, JobManager Web Interface: 
>>> http://10.43.0.1:30081
>>> 2021-05-05 16:56:20,239 INFO  org.apache.flink.client.cli.CliFrontend       
>>>                [] - Waiting for response...
>>> 2021-05-05 17:02:09,605 ERROR org.apache.flink.client.cli.CliFrontend       
>>>                [] - Error while running the command.
>>> org.apache.flink.util.FlinkException: Failed to retrieve job list.
>>>         at 
>>> org.apache.flink.client.cli.CliFrontend.listJobs(CliFrontend.java:449) 
>>> ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.client.cli.CliFrontend.lambda$list$0(CliFrontend.java:430) 
>>> ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1002)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.client.cli.CliFrontend.list(CliFrontend.java:427) 
>>> ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1060) 
>>> ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
>>>  [flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132) 
>>> [flink-dist_2.12-1.13.0.jar:1.13.0]
>>> Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: 
>>> Could not complete the operation. Number of retries has been exhausted.
>>>         at 
>>> org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$9(FutureUtils.java:386)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
>>>  ~[?:1.8.0_292]
>>>         at 
>>> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
>>>  ~[?:1.8.0_292]
>>>         at 
>>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>>>  ~[?:1.8.0_292]
>>>         at 
>>> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
>>>  ~[?:1.8.0_292]
>>>         at 
>>> org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$1(RestClient.java:430)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:570)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:549)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:263)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292]
>>> Caused by: java.util.concurrent.CompletionException: 
>>> org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException: 
>>> connection timed out: /10.43.0.1:30081
>>>         at 
>>> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
>>>  ~[?:1.8.0_292]
>>>         at 
>>> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
>>>  ~[?:1.8.0_292]
>>>         at 
>>> java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:957)
>>>  ~[?:1.8.0_292]
>>>         at 
>>> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:940)
>>>  ~[?:1.8.0_292]
>>>         at 
>>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>>>  ~[?:1.8.0_292]
>>>         at 
>>> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
>>>  ~[?:1.8.0_292]
>>>         at 
>>> org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$1(RestClient.java:430)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:570)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:549)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:263)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292]
>>> Caused by: 
>>> org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException: 
>>> connection timed out: /10.43.0.1:30081
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:261)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292]
>>> ngleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) 
>>> ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at 
>>> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>>>  ~[flink-dist_2.12-1.13.0.jar:1.13.0]
>>>         at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292]
>>>
>>>
>>> On Wed, May 5, 2021 at 6:59 AM Robert Metzger <rmetz...@apache.org>
>>> wrote:
>>>
>>>> Hi,
>>>> can you check the client log in the "log/" directory?
>>>> The Flink client will try to access the K8s API server to retrieve the
>>>> endpoint of the jobmanager. For that, the pod needs to have permissions
>>>> (through a service account) to make such calls to K8s. My hope is that the
>>>> logs or previous messages are giving an indication into what Flink is
>>>> trying to do.
>>>> Can you also try running on DEBUG log level? (should be the
>>>> log4j-cli.properties file).
>>>>
>>>>
>>>>
>>>> On Tue, May 4, 2021 at 3:17 PM Robert Cullen <cinquate...@gmail.com>
>>>> wrote:
>>>>
>>>>> I have a flink cluster running in kubernetes, just the basic
>>>>> installation with one JobManager and two TaskManagers. I want to interact
>>>>> with it via command line from a separate container ie:
>>>>>
>>>>> root@flink-client:/opt/flink# ./bin/flink list --target 
>>>>> kubernetes-application -Dkubernetes.cluster-id=job-manager
>>>>>
>>>>> How do you interact in the same kubernetes instance via CLI (Not from
>>>>> the desktop)?  This is the exception:
>>>>>
>>>>> ------------------------------------------------------------
>>>>>  The program finished with the following exception:
>>>>>
>>>>> java.lang.RuntimeException: 
>>>>> org.apache.flink.client.deployment.ClusterRetrieveException: Could not 
>>>>> get the rest endpoint of job-manager
>>>>>         at 
>>>>> org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$0(KubernetesClusterDescriptor.java:103)
>>>>>         at 
>>>>> org.apache.flink.kubernetes.KubernetesClusterDescriptor.retrieve(KubernetesClusterDescriptor.java:145)
>>>>>         at 
>>>>> org.apache.flink.kubernetes.KubernetesClusterDescriptor.retrieve(KubernetesClusterDescriptor.java:67)
>>>>>         at 
>>>>> org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1001)
>>>>>         at 
>>>>> org.apache.flink.client.cli.CliFrontend.list(CliFrontend.java:427)
>>>>>         at 
>>>>> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1060)
>>>>>         at 
>>>>> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
>>>>>         at 
>>>>> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
>>>>>         at 
>>>>> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)
>>>>> Caused by: org.apache.flink.client.deployment.ClusterRetrieveException: 
>>>>> Could not get the rest endpoint of job-manager
>>>>>         ... 9 more
>>>>> root@flink-client:/opt/flink#
>>>>>
>>>>> --
>>>>> Robert Cullen
>>>>> 240-475-4490
>>>>>
>>>>
>>>
>>> --
>>> Robert Cullen
>>> 240-475-4490
>>>
>>

-- 
Robert Cullen
240-475-4490

Reply via email to