I resolved this by changing the jobmanager-rest-service.yaml (Changed type to ClusterIP and removed nodePort
apiVersion: v1 kind: Service metadata: name: flink-jobmanager-rest spec: type: ClusterIP ports: - name: rest port: 8081 targetPort: 8081 #nodePort: 30081 selector: app: flink component: jobmanager On Wed, May 5, 2021 at 10:28 PM Yang Wang <danrtsey...@gmail.com> wrote: > It seems that you are using the NodePort to expose the rest service. If > you only want to access the Flink UI/rest in the K8s cluster, > then I would suggest to set "kubernetes.rest-service.exposed.type" to > "ClusterIP". Because we are using the K8s master node to > construct the JobManager rest endpoint when using NodePort. Sometime, it > is not accessible due to firewall. > > Best, > Yang > > Robert Metzger <rmetz...@apache.org> 于2021年5月6日周四 上午2:08写道: > >> Okay, it appears to have resolved 10.43.0.1:30081 as the address of the >> JobManager. Most likely, the container can not access this address. Can you >> validate this from within the container? >> >> If I understand the Flink documentation correctly, you should be able to >> manually specify rest.address, rest.port for the JobManager address. If >> you can manually figure out an address to the JobManager service, and pass >> it to Flink, the submission should work. >> >> On Wed, May 5, 2021 at 7:15 PM Robert Cullen <cinquate...@gmail.com> >> wrote: >> >>> Thanks for the reply. Here is an updated exception with DEBUG on. It >>> appears to be timing out: >>> >>> 2021-05-05 16:56:19,700 DEBUG >>> org.apache.flink.kubernetes.kubeclient.FlinkKubeClientFactory [] - Setting >>> namespace of Kubernetes client to cmdaa >>> 2021-05-05 16:56:19,700 DEBUG >>> org.apache.flink.kubernetes.kubeclient.FlinkKubeClientFactory [] - Setting >>> max concurrent requests of Kubernetes client to 64 >>> 2021-05-05 16:56:20,176 INFO >>> org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Retrieve >>> flink cluster flink-jobmanager successfully, JobManager Web Interface: >>> http://10.43.0.1:30081 >>> 2021-05-05 16:56:20,239 INFO org.apache.flink.client.cli.CliFrontend >>> [] - Waiting for response... >>> 2021-05-05 17:02:09,605 ERROR org.apache.flink.client.cli.CliFrontend >>> [] - Error while running the command. >>> org.apache.flink.util.FlinkException: Failed to retrieve job list. >>> at >>> org.apache.flink.client.cli.CliFrontend.listJobs(CliFrontend.java:449) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.client.cli.CliFrontend.lambda$list$0(CliFrontend.java:430) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1002) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.client.cli.CliFrontend.list(CliFrontend.java:427) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1060) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28) >>> [flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132) >>> [flink-dist_2.12-1.13.0.jar:1.13.0] >>> Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: >>> Could not complete the operation. Number of retries has been exhausted. >>> at >>> org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$9(FutureUtils.java:386) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) >>> ~[?:1.8.0_292] >>> at >>> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) >>> ~[?:1.8.0_292] >>> at >>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) >>> ~[?:1.8.0_292] >>> at >>> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) >>> ~[?:1.8.0_292] >>> at >>> org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$1(RestClient.java:430) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:570) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:549) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:263) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292] >>> Caused by: java.util.concurrent.CompletionException: >>> org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException: >>> connection timed out: /10.43.0.1:30081 >>> at >>> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) >>> ~[?:1.8.0_292] >>> at >>> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) >>> ~[?:1.8.0_292] >>> at >>> java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:957) >>> ~[?:1.8.0_292] >>> at >>> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:940) >>> ~[?:1.8.0_292] >>> at >>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) >>> ~[?:1.8.0_292] >>> at >>> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) >>> ~[?:1.8.0_292] >>> at >>> org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$1(RestClient.java:430) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:570) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:549) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:263) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292] >>> Caused by: >>> org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException: >>> connection timed out: /10.43.0.1:30081 >>> at >>> org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:261) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292] >>> ngleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at >>> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) >>> ~[flink-dist_2.12-1.13.0.jar:1.13.0] >>> at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292] >>> >>> >>> On Wed, May 5, 2021 at 6:59 AM Robert Metzger <rmetz...@apache.org> >>> wrote: >>> >>>> Hi, >>>> can you check the client log in the "log/" directory? >>>> The Flink client will try to access the K8s API server to retrieve the >>>> endpoint of the jobmanager. For that, the pod needs to have permissions >>>> (through a service account) to make such calls to K8s. My hope is that the >>>> logs or previous messages are giving an indication into what Flink is >>>> trying to do. >>>> Can you also try running on DEBUG log level? (should be the >>>> log4j-cli.properties file). >>>> >>>> >>>> >>>> On Tue, May 4, 2021 at 3:17 PM Robert Cullen <cinquate...@gmail.com> >>>> wrote: >>>> >>>>> I have a flink cluster running in kubernetes, just the basic >>>>> installation with one JobManager and two TaskManagers. I want to interact >>>>> with it via command line from a separate container ie: >>>>> >>>>> root@flink-client:/opt/flink# ./bin/flink list --target >>>>> kubernetes-application -Dkubernetes.cluster-id=job-manager >>>>> >>>>> How do you interact in the same kubernetes instance via CLI (Not from >>>>> the desktop)? This is the exception: >>>>> >>>>> ------------------------------------------------------------ >>>>> The program finished with the following exception: >>>>> >>>>> java.lang.RuntimeException: >>>>> org.apache.flink.client.deployment.ClusterRetrieveException: Could not >>>>> get the rest endpoint of job-manager >>>>> at >>>>> org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$0(KubernetesClusterDescriptor.java:103) >>>>> at >>>>> org.apache.flink.kubernetes.KubernetesClusterDescriptor.retrieve(KubernetesClusterDescriptor.java:145) >>>>> at >>>>> org.apache.flink.kubernetes.KubernetesClusterDescriptor.retrieve(KubernetesClusterDescriptor.java:67) >>>>> at >>>>> org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1001) >>>>> at >>>>> org.apache.flink.client.cli.CliFrontend.list(CliFrontend.java:427) >>>>> at >>>>> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1060) >>>>> at >>>>> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132) >>>>> at >>>>> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28) >>>>> at >>>>> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132) >>>>> Caused by: org.apache.flink.client.deployment.ClusterRetrieveException: >>>>> Could not get the rest endpoint of job-manager >>>>> ... 9 more >>>>> root@flink-client:/opt/flink# >>>>> >>>>> -- >>>>> Robert Cullen >>>>> 240-475-4490 >>>>> >>>> >>> >>> -- >>> Robert Cullen >>> 240-475-4490 >>> >> -- Robert Cullen 240-475-4490