It seems that the TaskManager pod could not resolve the JobManager address "franz-01.default", which is constructed in "k8s-service-name.namespace". I think you need to check whether the coreDNS is running normally in your K8s cluster. You could start a busybox pod on the same node with TaskManager and then do the "nslookup franz-01.default" to verify the dns resolution.
Best, Yang Chesnay Schepler <ches...@apache.org> 于2021年5月11日周二 下午6:30写道: > Pulling in Yang Wang who may shed some light on the matter. > > You could also have a look at > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Native-kubernetes-setup-failed-to-start-job-td39066.html; > while the issue was not actually resolved it may give some hints. > > On 5/10/2021 4:40 PM, Valentin Wallyn wrote: > > Hi, > > I'm trying to use Flink on native kubernetes ( > https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/resource-providers/native_kubernetes/) > but I have an error even with the example from the documentation. > > The job get submitted but stays in "created" status until it timeouts > after 5 minutes. In the log of the task manager, I can see that the error > is "*Could not resolve ResourceManager address"* > > What can be the issue ? > > > Here are the logs : > > > ./bin/kubernetes-session.sh -Dkubernetes.cluster-id=franz-01 > > > > > > > > > > > > > > > *2021-05-10 16:05:00,392 INFO > org.apache.flink.configuration.GlobalConfiguration [] - Loading > configuration property: jobmanager.rpc.address, localhost 2021-05-10 > 16:05:00,395 INFO org.apache.flink.configuration.GlobalConfiguration > [] - Loading configuration property: jobmanager.rpc.port, 6123 > 2021-05-10 16:05:00,395 INFO > org.apache.flink.configuration.GlobalConfiguration [] - Loading > configuration property: jobmanager.memory.process.size, 1600m 2021-05-10 > 16:05:00,395 INFO org.apache.flink.configuration.GlobalConfiguration > [] - Loading configuration property: taskmanager.memory.process.size, > 1728m 2021-05-10 16:05:00,395 INFO > org.apache.flink.configuration.GlobalConfiguration [] - Loading > configuration property: taskmanager.numberOfTaskSlots, 1 2021-05-10 > 16:05:00,395 INFO org.apache.flink.configuration.GlobalConfiguration > [] - Loading configuration property: parallelism.default, 1 2021-05-10 > 16:05:00,396 INFO org.apache.flink.configuration.GlobalConfiguration > [] - Loading configuration property: > jobmanager.execution.failover-strategy, region 2021-05-10 16:05:00,432 INFO > org.apache.flink.client.deployment.DefaultClusterClientServiceLoader [] - > Could not load factory due to missing dependencies. 2021-05-10 16:05:02,680 > INFO org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - > The derived from fraction jvm overhead memory (160.000mb (167772162 bytes)) > is less than its min value 192.000mb (201326592 bytes), min value will be > used instead 2021-05-10 16:05:02,690 INFO > org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The > derived from fraction jvm overhead memory (172.800mb (181193935 bytes)) is > less than its min value 192.000mb (201326592 bytes), min value will be used > instead 2021-05-10 16:05:02,699 INFO > org.apache.flink.kubernetes.utils.KubernetesUtils [] - > Kubernetes deployment requires a fixed port. Configuration blob.server.port > will be set to 6124 2021-05-10 16:05:02,700 INFO > org.apache.flink.kubernetes.utils.KubernetesUtils [] - > Kubernetes deployment requires a fixed port. Configuration > taskmanager.rpc.port will be set to 6122 2021-05-10 16:05:02,760 INFO > org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The > derived from fraction jvm overhead memory (160.000mb (167772162 bytes)) is > less than its min value 192.000mb (201326592 bytes), min value will be used > instead 2021-05-10 16:05:05,440 INFO > org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Create > flink session cluster franz-01 successfully, JobManager Web Interface: > http://xxx:8081 <http://xxx:8081>* > > > *Task Manager logs* > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *2021-05-10 14:09:05,463 INFO > org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - > -D 2021-05-10 14:09:05,464 INFO > org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - > taskmanager.memory.framework.off-heap.size=134217728b 2021-05-10 > 14:09:05,464 INFO > org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - > -D 2021-05-10 14:09:05,464 INFO > org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - > taskmanager.memory.network.max=134217730b 2021-05-10 14:09:05,464 INFO > org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - > -D 2021-05-10 14:09:05,464 INFO > org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - > taskmanager.memory.network.min=134217730b 2021-05-10 14:09:05,464 INFO > org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - > -D 2021-05-10 14:09:05,465 INFO > org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - > taskmanager.memory.framework.heap.size=134217728b 2021-05-10 > 14:09:05,465 INFO > org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - > -D 2021-05-10 14:09:05,465 INFO > org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - > taskmanager.memory.managed.size=536870920b 2021-05-10 14:09:05,465 INFO > org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - > -D 2021-05-10 14:09:05,465 INFO > org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - > taskmanager.cpu.cores=1.0 2021-05-10 14:09:05,465 INFO > org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - > -D 2021-05-10 14:09:05,465 INFO > org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - > taskmanager.memory.task.heap.size=402653174b 2021-05-10 14:09:05,466 > INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner > [] - -D 2021-05-10 14:09:05,466 INFO > org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - > taskmanager.memory.task.off-heap.size=0b 2021-05-10 14:09:05,466 INFO > org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - > -D 2021-05-10 14:09:05,467 INFO > org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - > taskmanager.memory.jvm-metaspace.size=268435456b 2021-05-10 > 14:09:05,467 INFO > org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - > -D 2021-05-10 14:09:05,467 INFO > org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - > taskmanager.memory.jvm-overhead.max=201326592b 2021-05-10 14:09:05,470 > INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner > [] - -D 2021-05-10 14:09:05,470 INFO > org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - > taskmanager.memory.jvm-overhead.min=201326592b 2021-05-10 14:09:05,470 > INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner > [] - --configDir 2021-05-10 14:09:05,470 INFO > org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - > /opt/flink/conf 2021-05-10 14:09:05,470 INFO > org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - > -Dtaskmanager.resource-id=franz-01-taskmanager-1-1 2021-05-10 > 14:09:05,471 INFO > org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - > -Djobmanager.memory.off-heap.size=134217728b 2021-05-10 14:09:05,471 > INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner > [] - -Djobmanager.memory.jvm-overhead.min=201326592b 2021-05-10 > 14:09:05,472 INFO > org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - > -Dweb.tmpdir=/tmp/flink-web-e60a7b21-4e2b-4b6c-a0ac-5b08816edcee > 2021-05-10 14:09:05,472 INFO > org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - > -Djobmanager.memory.jvm-metaspace.size=268435456b 2021-05-10 > 14:09:05,472 INFO > org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - > -Djobmanager.memory.heap.size=1073741824b 2021-05-10 14:09:05,472 INFO > org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - > -Djobmanager.memory.jvm-overhead.max=201326592b 2021-05-10 14:09:05,472 > INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner > [] - Classpath: > /opt/flink/lib/flink-csv-1.12.2.jar:/opt/flink/lib/flink-json-1.12.2.jar:/opt/flink/lib/flink-shaded-zookeeper-3.4.14.jar:/opt/flink/lib/flink-table-blink_2.12-1.12.2.jar:/opt/flink/lib/flink-table_2.12-1.12.2.jar:/opt/flink/lib/log4j-1.2-api-2.12.1.jar:/opt/flink/lib/log4j-api-2.12.1.jar:/opt/flink/lib/log4j-core-2.12.1.jar:/opt/flink/lib/log4j-slf4j-impl-2.12.1.jar:/opt/flink/lib/flink-dist_2.12-1.12.2.jar::: > 2021-05-10 14:09:05,472 INFO > org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - > -------------------------------------------------------------------------------- > 2021-05-10 14:09:05,475 INFO > org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - > Registered UNIX signal handlers for [TERM, HUP, INT] 2021-05-10 > 14:09:05,510 INFO org.apache.flink.configuration.GlobalConfiguration > [] - Loading configuration property: blob.server.port, 6124 2021-05-10 > 14:09:05,511 INFO org.apache.flink.configuration.GlobalConfiguration > [] - Loading configuration property: taskmanager.memory.process.size, > 1728m 2021-05-10 14:09:05,511 INFO > org.apache.flink.configuration.GlobalConfiguration [] - Loading > configuration property: kubernetes.internal.jobmanager.entrypoint.class, > org.apache.flink.kubernetes.entrypoint.KubernetesSessionClusterEntrypoint > 2021-05-10 14:09:05,513 INFO > org.apache.flink.configuration.GlobalConfiguration [] - Loading > configuration property: jobmanager.execution.failover-strategy, region > 2021-05-10 14:09:05,514 INFO > org.apache.flink.configuration.GlobalConfiguration [] - Loading > configuration property: jobmanager.rpc.address, franz-01.default 2021-05-10 > 14:09:05,514 INFO org.apache.flink.configuration.GlobalConfiguration > [] - Loading configuration property: execution.target, > kubernetes-session 2021-05-10 14:09:05,515 INFO > org.apache.flink.configuration.GlobalConfiguration [] - Loading > configuration property: jobmanager.memory.process.size, 1600m 2021-05-10 > 14:09:05,516 INFO org.apache.flink.configuration.GlobalConfiguration > [] - Loading configuration property: jobmanager.rpc.port, 6123 > 2021-05-10 14:09:05,516 INFO > org.apache.flink.configuration.GlobalConfiguration [] - Loading > configuration property: kubernetes.cluster-id, franz-01 2021-05-10 > 14:09:05,516 INFO org.apache.flink.configuration.GlobalConfiguration > [] - Loading configuration property: taskmanager.rpc.port, 6122 > 2021-05-10 14:09:05,517 INFO > org.apache.flink.configuration.GlobalConfiguration [] - Loading > configuration property: internal.cluster.execution-mode, NORMAL 2021-05-10 > 14:09:05,517 INFO org.apache.flink.configuration.GlobalConfiguration > [] - Loading configuration property: parallelism.default, 1 2021-05-10 > 14:09:05,519 INFO org.apache.flink.configuration.GlobalConfiguration > [] - Loading configuration property: taskmanager.numberOfTaskSlots, 1 > 2021-05-10 14:09:05,658 INFO org.apache.flink.core.fs.FileSystem > [] - Hadoop is not in the classpath/dependencies. The > extended set of supported File Systems via Hadoop is not available. > 2021-05-10 14:09:05,733 INFO > org.apache.flink.runtime.security.modules.HadoopModuleFactory [] - Cannot > create Hadoop Security Module because Hadoop cannot be found in the > Classpath. 2021-05-10 14:09:05,738 INFO > org.apache.flink.runtime.security.modules.JaasModule [] - Jaas > file will be created as /tmp/jaas-3361029581556571704.conf. 2021-05-10 > 14:09:05,744 INFO > org.apache.flink.runtime.security.contexts.HadoopSecurityContextFactory [] > - Cannot install HadoopSecurityContext because Hadoop cannot be found in > the Classpath. 2021-05-10 14:09:05,811 INFO > org.apache.flink.configuration.Configuration [] - Config > uses fallback configuration key 'jobmanager.rpc.address' instead of key > 'rest.address' 2021-05-10 14:09:05,855 INFO > org.apache.flink.runtime.util.LeaderRetrievalUtils [] - Trying > to select the network interface and address to use by connecting to the > leading JobManager. 2021-05-10 14:09:05,855 INFO > org.apache.flink.runtime.util.LeaderRetrievalUtils [] - > TaskManager will try to connect for PT10S before falling back to heuristics > 2021-05-10 14:09:26,116 WARN org.apache.flink.runtime.net.ConnectionUtils > [] - Could not connect to franz-01.default:6123. Selecting > a local address using heuristics. 2021-05-10 14:09:26,116 WARN > org.apache.flink.runtime.net.ConnectionUtils [] - Could > not find any IPv4 address that is not loopback or link-local. Using > localhost address. 2021-05-10 14:09:26,117 INFO > org.apache.flink.runtime.taskexecutor.TaskManagerRunner [] - > TaskManager will use hostname/address 'franz-01-taskmanager-1-1' > (10.2.2.37) for communication. 2021-05-10 14:09:26,136 INFO > org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils [] - Trying > to start actor system, external address 10.2.2.37:6122 > <http://10.2.2.37:6122>, bind address 0.0.0.0:6122 <http://0.0.0.0:6122>. > 2021-05-10 14:09:27,212 INFO akka.event.slf4j.Slf4jLogger > [] - Slf4jLogger started 2021-05-10 14:09:27,283 INFO > akka.remote.Remoting [] - Starting > remoting 2021-05-10 14:09:27,586 INFO akka.remote.Remoting > [] - Remoting started; listening on addresses > :[akka.tcp://flink@10.2.2.37:6122 <http://flink@10.2.2.37:6122>] 2021-05-10 > 14:09:27,730 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils > [] - Actor system started at akka.tcp://flink@10.2.2.37:6122 > <http://flink@10.2.2.37:6122> 2021-05-10 14:09:27,781 INFO > org.apache.flink.runtime.metrics.MetricRegistryImpl [] - No > metrics reporter configured, no metrics will be exposed/reported. > 2021-05-10 14:09:27,786 INFO > org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils [] - Trying > to start actor system, external address 10.2.2.37:0 <http://10.2.2.37:0>, > bind address 0.0.0.0:0 <http://0.0.0.0:0>. 2021-05-10 14:09:27,814 INFO > akka.event.slf4j.Slf4jLogger [] - > Slf4jLogger started 2021-05-10 14:09:27,819 INFO akka.remote.Remoting > [] - Starting remoting 2021-05-10 > 14:09:27,881 INFO akka.remote.Remoting > [] - Remoting started; listening on addresses > :[akka.tcp://flink-metrics@10.2.2.37:39177 > <http://flink-metrics@10.2.2.37:39177>] 2021-05-10 14:09:27,895 INFO > org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils [] - Actor > system started at akka.tcp://flink-metrics@10.2.2.37:39177 > <http://flink-metrics@10.2.2.37:39177> 2021-05-10 14:09:27,916 INFO > org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting > RPC endpoint for org.apache.flink.runtime.metrics.dump.MetricQueryService > at > akka://flink-metrics/user/rpc/MetricQueryService_franz-01-taskmanager-1-1 . > 2021-05-10 14:09:27,931 INFO > org.apache.flink.runtime.blob.PermanentBlobCache [] - Created > BLOB cache storage directory > /tmp/blobStore-16255e13-c39a-442f-853a-cd1e331e7325 2021-05-10 14:09:27,934 > INFO org.apache.flink.runtime.blob.TransientBlobCache [] - > Created BLOB cache storage directory > /tmp/blobStore-5ac02374-808a-4529-b80c-088dbeac2711 2021-05-10 14:09:27,955 > INFO org.apache.flink.runtime.externalresource.ExternalResourceUtils [] - > Enabled external resources: [] 2021-05-10 14:09:27,955 INFO > org.apache.flink.runtime.externalresource.ExternalResourceUtils [] - > Enabled external resources: [] 2021-05-10 14:09:27,955 INFO > org.apache.flink.runtime.taskexecutor.TaskManagerRunner [] - Starting > TaskManager with ResourceID: franz-01-taskmanager-1-1 2021-05-10 > 14:09:27,990 INFO > org.apache.flink.runtime.taskexecutor.TaskManagerServices [] - > Temporary file directory '/tmp': total 48 GB, usable 38 GB (79.17% usable) > 2021-05-10 14:09:27,997 INFO > org.apache.flink.runtime.io.disk.FileChannelManagerImpl [] - > FileChannelManager uses directory > /tmp/flink-io-c08780a7-90bd-4259-8f51-8a24d95c21df for spill files. > 2021-05-10 14:09:28,059 INFO > org.apache.flink.runtime.io.network.netty.NettyConfig [] - > NettyConfig [server address: /0.0.0.0 <http://0.0.0.0>, server port: 0, ssl > enabled: false, memory segment size (bytes): 32768, transport type: AUTO, > number of server threads: 1 (manual), number of client threads: 1 (manual), > server connect backlog: 0 (use Netty's default), client connect timeout > (sec): 120, send/receive buffer size (bytes): 0 (use Netty's default)] > 2021-05-10 14:09:28,063 INFO > org.apache.flink.runtime.io.disk.FileChannelManagerImpl [] - > FileChannelManager uses directory > /tmp/flink-netty-shuffle-209ae6cc-6fd5-4c9e-b6df-acc675a6995c for spill > files. 2021-05-10 14:09:28,578 INFO > org.apache.flink.runtime.io.network.buffer.NetworkBufferPool [] - > Allocated 128 MB for network buffer pool (number of memory segments: 4096, > bytes per segment: 32768). 2021-05-10 14:09:28,594 INFO > org.apache.flink.runtime.io.network.NettyShuffleEnvironment [] - Starting > the network environment and its components. 2021-05-10 14:09:28,789 INFO > org.apache.flink.runtime.io.network.netty.NettyClient [] - > Transport type 'auto': using EPOLL. 2021-05-10 14:09:28,791 INFO > org.apache.flink.runtime.io.network.netty.NettyClient [] - > Successful initialization (took 196 ms). 2021-05-10 14:09:28,796 INFO > org.apache.flink.runtime.io.network.netty.NettyServer [] - > Transport type 'auto': using EPOLL. 2021-05-10 14:09:28,892 INFO > org.apache.flink.runtime.io.network.netty.NettyServer [] - > Successful initialization (took 99 ms). Listening on SocketAddress > /0:0:0:0:0:0:0:0%0:40399. 2021-05-10 14:09:28,894 INFO > org.apache.flink.runtime.taskexecutor.KvStateService [] - Starting > the kvState service and its components. 2021-05-10 14:09:28,979 INFO > org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting > RPC endpoint for org.apache.flink.runtime.taskexecutor.TaskExecutor at > akka://flink/user/rpc/taskmanager_0 . 2021-05-10 14:09:29,002 INFO > org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Start > job leader service. 2021-05-10 14:09:29,005 INFO > org.apache.flink.runtime.filecache.FileCache [] - User > file cache uses directory > /tmp/flink-dist-cache-bc340200-15c9-4d0a-950a-f43469bdb58d 2021-05-10 > 14:09:29,055 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor > [] - Connecting to ResourceManager > akka.tcp://flink@franz-01.default:6123/user/rpc/resourcemanager_* > <akka.tcp://flink@franz-01.default:6123/user/rpc/resourcemanager_*>(00000000000000000000000000000000). > 2021-05-10 14:09:29,276 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink@franz-01.default:6123 > <akka.tcp://flink@franz-01.default:6123>] has failed, address is now gated > for [50] ms. Reason: [Association failed with > [akka.tcp://flink@franz-01.default:6123 > <akka.tcp://flink@franz-01.default:6123>]] Caused by: > [java.net.UnknownHostException: franz-01.default] 2021-05-10 14:09:29,289 > INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - > Could not resolve ResourceManager address > akka.tcp://flink@franz-01.default:6123/user/rpc/resourcemanager_* > <akka.tcp://flink@franz-01.default:6123/user/rpc/resourcemanager_*>, > retrying in 10000 ms: Could not connect to rpc endpoint under address > akka.tcp://flink@franz-01.default:6123/user/rpc/resourcemanager_* > <akka.tcp://flink@franz-01.default:6123/user/rpc/resourcemanager_*>. > 2021-05-10 14:09:49,314 INFO > org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could > not resolve ResourceManager address > akka.tcp://flink@franz-01.default:6123/user/rpc/resourcemanager_* > <akka.tcp://flink@franz-01.default:6123/user/rpc/resourcemanager_*>, > retrying in 10000 ms: Could not connect to rpc endpoint under address > akka.tcp://flink@franz-01.default:6123/user/rpc/resourcemanager_* > <akka.tcp://flink@franz-01.default:6123/user/rpc/resourcemanager_*>. > 2021-05-10 14:09:59,325 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink@franz-01.default:6123 > <akka.tcp://flink@franz-01.default:6123>] has failed, address is now gated > for [50] ms. Reason: [Association failed with > [akka.tcp://flink@franz-01.default:6123 > <akka.tcp://flink@franz-01.default:6123>]] Caused by: > [java.net.UnknownHostException: franz-01.default: Temporary failure in name > resolution] 2021-05-10 14:09:59,327 INFO > org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could > not resolve ResourceManager address > akka.tcp://flink@franz-01.default:6123/user/rpc/resourcemanager_* > <akka.tcp://flink@franz-01.default:6123/user/rpc/resourcemanager_*>, > retrying in 10000 ms: Could not connect to rpc endpoint under address > akka.tcp://flink@franz-01.default:6123/user/rpc/resourcemanager_* > <akka.tcp://flink@franz-01.default:6123/user/rpc/resourcemanager_*>. > 2021-05-10 14:10:19,365 INFO > org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could > not resolve ResourceManager address > akka.tcp://flink@franz-01.default:6123/user/rpc/resourcemanager_* > <akka.tcp://flink@franz-01.default:6123/user/rpc/resourcemanager_*>, > retrying in 10000 ms: Could not connect to rpc endpoint under address > akka.tcp://flink@franz-01.default:6123/user/rpc/resourcemanager_* > <akka.tcp://flink@franz-01.default:6123/user/rpc/resourcemanager_*>. > 2021-05-10 14:10:29,363 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink@franz-01.default:6123 > <akka.tcp://flink@franz-01.default:6123>] has failed, address is now gated > for [50] ms. Reason: [Association failed with > [akka.tcp://flink@franz-01.default:6123 > <akka.tcp://flink@franz-01.default:6123>]] Caused by: > [java.net.UnknownHostException: franz-01.default: Temporary failure in name > resolution] 2021-05-10 14:10:29,385 INFO > org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could > not resolve ResourceManager address > akka.tcp://flink@franz-01.default:6123/user/rpc/resourcemanager_* > <akka.tcp://flink@franz-01.default:6123/user/rpc/resourcemanager_*>, > retrying in 10000 ms: Could not connect to rpc endpoint under address > akka.tcp://flink@franz-01.default:6123/user/rpc/resourcemanager_* > <akka.tcp://flink@franz-01.default:6123/user/rpc/resourcemanager_*>. > 2021-05-10 14:10:49,425 INFO > org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could > not resolve ResourceManager address > akka.tcp://flink@franz-01.default:6123/user/rpc/resourcemanager_* > <akka.tcp://flink@franz-01.default:6123/user/rpc/resourcemanager_*>, > retrying in 10000 ms: Could not connect to rpc endpoint under address > akka.tcp://flink@franz-01.default:6123/user/rpc/resourcemanager_* > <akka.tcp://flink@franz-01.default:6123/user/rpc/resourcemanager_*>. > 2021-05-10 14:10:59,423 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink@franz-01.default:6123 > <akka.tcp://flink@franz-01.default:6123>] has failed, address is now gated > for [50] ms. Reason: [Association failed with > [akka.tcp://flink@franz-01.default:6123 > <akka.tcp://flink@franz-01.default:6123>]] Caused by: > [java.net.UnknownHostException: franz-01.default: Temporary failure in name > resolution] 2021-05-10 14:10:59,445 INFO > org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could > not resolve ResourceManager address > akka.tcp://flink@franz-01.default:6123/user/rpc/resourcemanager_* > <akka.tcp://flink@franz-01.default:6123/user/rpc/resourcemanager_*>, > retrying in 10000 ms: Could not connect to rpc endpoint under address > akka.tcp://flink@franz-01.default:6123/user/rpc/resourcemanager_* > <akka.tcp://flink@franz-01.default:6123/user/rpc/resourcemanager_*>. * > *Job Manager logs* > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *2021-05-10 14:09:00,393 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Received > JobGraph submission a63f806ba9a172b728395266a6dc41fe (Flink Streaming Job). > 2021-05-10 14:09:00,395 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - > Submitting job a63f806ba9a172b728395266a6dc41fe (Flink Streaming Job). > 2021-05-10 14:09:00,524 INFO > org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting > RPC endpoint for org.apache.flink.runtime.jobmaster.JobMaster at > akka://flink/user/rpc/jobmanager_2 . 2021-05-10 14:09:00,537 INFO > org.apache.flink.runtime.jobmaster.JobMaster [] - > Initializing job Flink Streaming Job (a63f806ba9a172b728395266a6dc41fe). > 2021-05-10 14:09:00,612 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - Using restart back off time strategy > NoRestartBackoffTimeStrategy for Flink Streaming Job > (a63f806ba9a172b728395266a6dc41fe). 2021-05-10 14:09:00,665 INFO > org.apache.flink.runtime.jobmaster.JobMaster [] - Running > initialization on master for job Flink Streaming Job > (a63f806ba9a172b728395266a6dc41fe). 2021-05-10 14:09:00,666 INFO > org.apache.flink.runtime.jobmaster.JobMaster [] - > Successfully ran initialization on master in 0 ms. 2021-05-10 14:09:00,707 > INFO org.apache.flink.runtime.scheduler.adapter.DefaultExecutionTopology > [] - Built 1 pipelined regions in 15 ms 2021-05-10 14:09:00,742 INFO > org.apache.flink.runtime.jobmaster.JobMaster [] - No state > backend has been configured, using default (Memory / JobManager) > MemoryStateBackend (data in heap memory / checkpoints to JobManager) > (checkpoints: 'null', savepoints: 'null', asynchronous: TRUE, maxStateSize: > 5242880) 2021-05-10 14:09:00,823 INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - No > checkpoint found during restore. 2021-05-10 14:09:00,830 INFO > org.apache.flink.runtime.jobmaster.JobMaster [] - Using > failover strategy > org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy@43519311 > for Flink Streaming Job (a63f806ba9a172b728395266a6dc41fe). 2021-05-10 > 14:09:00,844 INFO org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl > [] - JobManager runner for job Flink Streaming Job > (a63f806ba9a172b728395266a6dc41fe) was granted leadership with session id > 00000000-0000-0000-0000-000000000000 at > akka.tcp://flink@franz-01.default:6123/user/rpc/jobmanager_2 > <akka.tcp://flink@franz-01.default:6123/user/rpc/jobmanager_2>. 2021-05-10 > 14:09:00,848 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - Starting execution of job Flink Streaming Job > (a63f806ba9a172b728395266a6dc41fe) under job master id > 00000000000000000000000000000000. 2021-05-10 14:09:00,851 INFO > org.apache.flink.runtime.jobmaster.JobMaster [] - Starting > scheduling with scheduling strategy > [org.apache.flink.runtime.scheduler.strategy.PipelinedRegionSchedulingStrategy] > 2021-05-10 14:09:00,852 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job > Flink Streaming Job (a63f806ba9a172b728395266a6dc41fe) switched from state > CREATED to RUNNING. 2021-05-10 14:09:00,912 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: > Custom Source -> Filter -> Timestamps/Watermarks (1/1) > (b10791bc97d1d772bd443abd92bf32c0) switched from CREATED to SCHEDULED. > 2021-05-10 14:09:00,913 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - > Window(ProcessingTimeSessionWindows(60000), ProcessingTimeTrigger, > SessionAggregate, PassThroughWindowFunction) -> Sink: Unnamed (1/1) > (9ee57af7f96b318d16fb0784a693b481) switched from CREATED to SCHEDULED. > 2021-05-10 14:09:00,928 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > serve slot request, no ResourceManager connected. Adding as pending request > [SlotRequestId{045786d740e63f4a986dc2024be7b3fc}] 2021-05-10 14:09:00,939 > INFO org.apache.flink.runtime.jobmaster.JobMaster [] - > Connecting to ResourceManager > akka.tcp://flink@franz-01.default:6123/user/rpc/resourcemanager_* > <akka.tcp://flink@franz-01.default:6123/user/rpc/resourcemanager_*>(00000000000000000000000000000000) > 2021-05-10 14:09:00,947 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - Resolved ResourceManager address, beginning > registration 2021-05-10 14:09:00,949 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Registering job manager > 00000000000000000000000000000...@akka.tcp://flink@franz-01.default:6123/user/rpc/jobmanager_2 > <00000000000000000000000000000...@akka.tcp://flink@franz-01.default:6123/user/rpc/jobmanager_2> > for job a63f806ba9a172b728395266a6dc41fe. 2021-05-10 14:09:01,009 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Registered job manager > 00000000000000000000000000000...@akka.tcp://flink@franz-01.default:6123/user/rpc/jobmanager_2 > <00000000000000000000000000000...@akka.tcp://flink@franz-01.default:6123/user/rpc/jobmanager_2> > for job a63f806ba9a172b728395266a6dc41fe. 2021-05-10 14:09:01,016 INFO > org.apache.flink.runtime.jobmaster.JobMaster [] - > JobManager successfully registered at ResourceManager, leader id: > 00000000000000000000000000000000. 2021-05-10 14:09:01,018 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{045786d740e63f4a986dc2024be7b3fc}] and > profile ResourceProfile{UNKNOWN} with allocation id > be6a056136c6dec065af876bda1f6dd5 from resource manager. 2021-05-10 > 14:09:01,020 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Request slot with profile ResourceProfile{UNKNOWN} for job > a63f806ba9a172b728395266a6dc41fe with allocation id > be6a056136c6dec065af876bda1f6dd5. 2021-05-10 14:09:01,029 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Requesting new worker with resource spec WorkerResourceSpec {cpuCores=1.0, > taskHeapSize=384.000mb (402653174 bytes), taskOffHeapSize=0 bytes, > networkMemSize=128.000mb (134217730 bytes), managedMemSize=512.000mb > (536870920 bytes)}, current pending count: 1. 2021-05-10 14:09:01,035 INFO > org.apache.flink.runtime.externalresource.ExternalResourceUtils [] - > Enabled external resources: [] 2021-05-10 14:09:01,414 INFO > org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Creating > new TaskManager pod with name franz-01-taskmanager-1-1 and resource > <1728,1.0>. 2021-05-10 14:09:01,739 INFO > org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Pod > franz-01-taskmanager-1-1 is created. 2021-05-10 14:09:01,807 INFO > org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Received > new TaskManager pod: franz-01-taskmanager-1-1 2021-05-10 14:09:01,808 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Requested worker franz-01-taskmanager-1-1 with resource spec > WorkerResourceSpec {cpuCores=1.0, taskHeapSize=384.000mb (402653174 bytes), > taskOffHeapSize=0 bytes, networkMemSize=128.000mb (134217730 bytes), > managedMemSize=512.000mb (536870920 bytes)}.* > > > *Help appreciated. Thanks ! * > > >