Hello, I am trying to run flink on a kubernetes cluster using minikube and kubectl. I am following this example <https://github.com/sedgewickmm18/flink-kubernetes>, which runs a flink 1.2 cluster ok.
I am interested in running flink 1.5.1, but when I modify the flink version, I start to see these exceptions in taskmanager-controller logs. The exceptions are below: 2018-07-27 07:34:22,429 INFO org.apache.flink.core.fs.FileSystem > - Hadoop is not in the classpath/dependencies. The > extended set of supported File Systems via Hadoop is not available. > > 2018-07-27 07:34:22,442 INFO > org.apache.flink.runtime.security.modules.HadoopModuleFactory - Cannot > create Hadoop Security Module because Hadoop cannot be found in the > Classpath. > > 2018-07-27 07:34:22,460 INFO org.apache.flink.runtime.security.SecurityUtils > - Cannot install HadoopSecurityContext because Hadoop > cannot be found in the Classpath. > > 2018-07-27 07:34:22,622 WARN org.apache.flink.configuration.Configuration > - Config uses deprecated configuration key > 'jobmanager.rpc.address' instead of proper key 'rest.address' > > 2018-07-27 07:34:22,626 INFO > org.apache.flink.runtime.util.LeaderRetrievalUtils - Trying to > select the network interface and address to use by connecting to the > leading JobManager. > > 2018-07-27 07:34:22,626 INFO > org.apache.flink.runtime.util.LeaderRetrievalUtils - > TaskManager will try to connect for 10000 milliseconds before falling back > to heuristics > > 2018-07-27 07:34:22,629 INFO org.apache.flink.runtime.net.ConnectionUtils > - Retrieved new target address > taskmanager-controller-vncdz/172.17.0.7:6123. > > 2018-07-27 07:34:23,390 INFO org.apache.flink.runtime.net.ConnectionUtils > - Trying to connect to address > taskmanager-controller-vncdz/172.17.0.7:6123 > > 2018-07-27 07:34:23,391 INFO org.apache.flink.runtime.net.ConnectionUtils > - Failed to connect from address > 'taskmanager-controller-vncdz/172.17.0.7': Connection refused (Connection > refused) > > 2018-07-27 07:34:23,391 INFO org.apache.flink.runtime.net.ConnectionUtils > - Failed to connect from address '/172.17.0.7': > Connection refused (Connection refused) > > 2018-07-27 07:34:23,392 INFO org.apache.flink.runtime.net.ConnectionUtils > - Failed to connect from address '/172.17.0.7': > Connection refused (Connection refused) > > 2018-07-27 07:34:23,392 INFO org.apache.flink.runtime.net.ConnectionUtils > - Failed to connect from address '/127.0.0.1': Connection > refused (Connection refused) > > 2018-07-27 07:34:23,393 INFO org.apache.flink.runtime.net.ConnectionUtils > - Failed to connect from address '/172.17.0.7': > Connection refused (Connection refused) > > 2018-07-27 07:34:23,393 INFO org.apache.flink.runtime.net.ConnectionUtils > - Failed to connect from address '/127.0.0.1': Connection > refused (Connection refused) > > 2018-07-27 07:34:24,195 INFO org.apache.flink.runtime.net.ConnectionUtils > - Trying to connect to address > taskmanager-controller-vncdz/172.17.0.7:6123 > > 2018-07-27 07:34:24,196 INFO org.apache.flink.runtime.net.ConnectionUtils > - Failed to connect from address > 'taskmanager-controller-vncdz/172.17.0.7': Connection refused (Connection > refused) > > 2018-07-27 07:34:24,197 INFO org.apache.flink.runtime.net.ConnectionUtils > - Failed to connect from address '/172.17.0.7': > Connection refused (Connection refused) > > 2018-07-27 07:34:24,198 INFO org.apache.flink.runtime.net.ConnectionUtils > - Failed to connect from address '/172.17.0.7': > Connection refused (Connection refused) > > 2018-07-27 07:34:24,198 INFO org.apache.flink.runtime.net.ConnectionUtils > - Failed to connect from address '/127.0.0.1': Connection > refused (Connection refused) > > 2018-07-27 07:34:24,199 INFO org.apache.flink.runtime.net.ConnectionUtils > - Failed to connect from address '/172.17.0.7': > Connection refused (Connection refused) > > 2018-07-27 07:34:24,199 INFO org.apache.flink.runtime.net.ConnectionUtils > - Failed to connect from address '/127.0.0.1': Connection > refused (Connection refused) > > 2018-07-27 07:34:25,803 INFO org.apache.flink.runtime.net.ConnectionUtils > - Trying to connect to address > taskmanager-controller-vncdz/172.17.0.7:6123 > > 2018-07-27 07:34:25,811 INFO org.apache.flink.runtime.net.ConnectionUtils > - Failed to connect from address > 'taskmanager-controller-vncdz/172.17.0.7': Connection refused (Connection > refused) > > 2018-07-27 07:34:25,811 INFO org.apache.flink.runtime.net.ConnectionUtils > - Failed to connect from address '/172.17.0.7': > Connection refused (Connection refused) > > 2018-07-27 07:34:25,812 INFO org.apache.flink.runtime.net.ConnectionUtils > - Failed to connect from address '/172.17.0.7': > Connection refused (Connection refused) > > 2018-07-27 07:34:25,812 INFO org.apache.flink.runtime.net.ConnectionUtils > - Failed to connect from address '/127.0.0.1': Connection > refused (Connection refused) > > 2018-07-27 07:34:25,813 INFO org.apache.flink.runtime.net.ConnectionUtils > - Failed to connect from address '/172.17.0.7': > Connection refused (Connection refused) > > 2018-07-27 07:34:25,813 INFO org.apache.flink.runtime.net.ConnectionUtils > - Failed to connect from address '/127.0.0.1': Connection > refused (Connection refused) > > 2018-07-27 07:34:29,018 INFO org.apache.flink.runtime.net.ConnectionUtils > - Trying to connect to address > taskmanager-controller-vncdz/172.17.0.7:6123 > > 2018-07-27 07:34:29,098 INFO org.apache.flink.runtime.net.ConnectionUtils > - Failed to connect from address > 'taskmanager-controller-vncdz/172.17.0.7': Connection refused (Connection > refused) > > 2018-07-27 07:34:29,098 INFO org.apache.flink.runtime.net.ConnectionUtils > - Failed to connect from address '/172.17.0.7': > Connection refused (Connection refused) > > 2018-07-27 07:34:29,099 INFO org.apache.flink.runtime.net.ConnectionUtils > - Failed to connect from address '/172.17.0.7': > Connection refused (Connection refused) > > 2018-07-27 07:34:29,099 INFO org.apache.flink.runtime.net.ConnectionUtils > - Failed to connect from address '/127.0.0.1': Connection > refused (Connection refused) > > 2018-07-27 07:34:29,100 INFO org.apache.flink.runtime.net.ConnectionUtils > - Failed to connect from address '/172.17.0.7': > Connection refused (Connection refused) > > 2018-07-27 07:34:29,102 INFO org.apache.flink.runtime.net.ConnectionUtils > - Failed to connect from address '/127.0.0.1': Connection > refused (Connection refused) > > 2018-07-27 07:34:32,628 WARN org.apache.flink.runtime.net.ConnectionUtils > - Could not connect to taskmanager-controller-vncdz/ > 172.17.0.7:6123. Selecting a local address using heuristics. > > 2018-07-27 07:34:32,630 INFO > org.apache.flink.runtime.taskexecutor.TaskManagerRunner > - TaskManager will use hostname/address > 'taskmanager-controller-vncdz' (172.17.0.7) for communication. > > 2018-07-27 07:34:32,663 INFO > org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils > - Starting AkkaRpcService at taskmanager-controller-vncdz:0. > > 2018-07-27 07:34:33,574 INFO akka.event.slf4j.Slf4jLogger > - Slf4jLogger started > > 2018-07-27 07:34:34,335 INFO akka.remote.Remoting > - Starting remoting > > 2018-07-27 07:34:34,661 INFO akka.remote.Remoting > - Remoting started; listening on addresses > :[akka.tcp://flink@taskmanager-controller-vncdz:39769] > > 2018-07-27 07:34:34,698 INFO > org.apache.flink.runtime.metrics.MetricRegistryImpl > - No metrics reporter configured, no metrics will be > exposed/reported. > > 2018-07-27 07:34:34,710 INFO > org.apache.flink.runtime.blob.PermanentBlobCache - Created > BLOB cache storage directory > /tmp/blobStore-376e1f5a-810b-4999-91eb-ca5292b50d12 > > 2018-07-27 07:34:34,714 INFO > org.apache.flink.runtime.blob.TransientBlobCache - Created > BLOB cache storage directory > /tmp/blobStore-fb08f586-2992-4d4a-9e75-ed501bbdc4e3 > > 2018-07-27 07:34:34,718 INFO > org.apache.flink.runtime.io.network.netty.NettyConfig > - NettyConfig [server address: taskmanager-controller-vncdz/ > 172.17.0.7, server port: 0, ssl enabled: false, memory segment size > (bytes): 32768, transport type: NIO, number of server threads: 2 (manual), > number of client threads: 2 (manual), server connect backlog: 0 (use > Netty's default), client connect timeout (sec): 120, send/receive buffer > size (bytes): 0 (use Netty's default)] > > 2018-07-27 07:34:34,916 INFO > org.apache.flink.runtime.taskexecutor.TaskManagerServices > - Temporary file directory '/tmp': total 16 GB, usable 12 GB (75.00% > usable) > > 2018-07-27 07:34:35,605 INFO > org.apache.flink.runtime.io.network.buffer.NetworkBufferPool - Allocated > 102 MB for network buffer pool (number of memory segments: 3278, bytes per > segment: 32768). > > 2018-07-27 07:34:35,899 INFO > org.apache.flink.runtime.query.QueryableStateUtils - Could not > load Queryable State Client Proxy. Probable reason: > flink-queryable-state-runtime is not in the classpath. To enable Queryable > State, please move the flink-queryable-state-runtime jar from the opt to > the lib folder. > > 2018-07-27 07:34:35,900 INFO > org.apache.flink.runtime.query.QueryableStateUtils - Could not > load Queryable State Server. Probable reason: flink-queryable-state-runtime > is not in the classpath. To enable Queryable State, please move the > flink-queryable-state-runtime jar from the opt to the lib folder. > > 2018-07-27 07:34:35,901 INFO > org.apache.flink.runtime.io.network.NetworkEnvironment - Starting > the network environment and its components. > > 2018-07-27 07:34:35,946 INFO > org.apache.flink.runtime.io.network.netty.NettyClient > - Successful initialization (took 37 ms). > > 2018-07-27 07:34:35,988 INFO > org.apache.flink.runtime.io.network.netty.NettyServer > - Successful initialization (took 42 ms). Listening on > SocketAddress /172.17.0.7:41451. > > 2018-07-27 07:34:35,990 INFO > org.apache.flink.runtime.taskexecutor.TaskManagerServices > - Limiting managed memory to 0.7 of the currently free heap space > (641 MB), memory will be allocated lazily. > > 2018-07-27 07:34:36,000 INFO > org.apache.flink.runtime.io.disk.iomanager.IOManager - I/O > manager uses directory /tmp/flink-io-48184970-5e3d-4ae7-9ba4-40850532367a > for spill files. > > 2018-07-27 07:34:36,008 INFO org.apache.flink.runtime.filecache.FileCache > - User file cache uses directory > /tmp/flink-dist-cache-adbfd785-de17-48ae-8677-cf360db1fac2 > > 2018-07-27 07:34:36,199 INFO > org.apache.flink.runtime.taskexecutor.TaskManagerConfiguration - > Messages have a max timeout of 10000 ms > > 2018-07-27 07:34:36,211 INFO > org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting > RPC endpoint for org.apache.flink.runtime.taskexecutor.TaskExecutor at > akka://flink/user/taskmanager_0 . > > 2018-07-27 07:34:36,226 INFO > org.apache.flink.runtime.taskexecutor.JobLeaderService - Start job > leader service. > > 2018-07-27 07:34:36,231 INFO > org.apache.flink.runtime.taskexecutor.TaskExecutor - > Connecting to ResourceManager akka.tcp://flink@taskmanager-controller-vncdz > :6123/user/resourcemanager(00000000000000000000000000000000). > > 2018-07-27 07:34:36,513 WARN akka.remote.transport.netty.NettyTransport > - Remote connection to [null] failed with > java.net.ConnectException: Connection refused: taskmanager-controller-vncdz/ > 172.17.0.7:6123 > > 2018-07-27 07:34:36,513 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@taskmanager-controller-vncdz:6123] has failed, address > is now gated for [50] ms. Reason: [Association failed with > [akka.tcp://flink@taskmanager-controller-vncdz:6123]] Caused by: > [Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123] > > 2018-07-27 07:34:36,520 INFO > org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not > resolve ResourceManager address > akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager, > retrying in 10000 ms: Could not connect to rpc endpoint under address > akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager.. > > 2018-07-27 07:34:47,228 WARN akka.remote.transport.netty.NettyTransport > - Remote connection to [null] failed with > java.net.ConnectException: Connection refused: taskmanager-controller-vncdz/ > 172.17.0.7:6123 > > 2018-07-27 07:34:47,233 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@taskmanager-controller-vncdz:6123] has failed, address > is now gated for [50] ms. Reason: [Association failed with > [akka.tcp://flink@taskmanager-controller-vncdz:6123]] Caused by: > [Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123] > > 2018-07-27 07:34:47,234 INFO > org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not > resolve ResourceManager address > akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager, > retrying in 10000 ms: Could not connect to rpc endpoint under address > akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager.. > > 2018-07-27 07:34:57,255 WARN akka.remote.transport.netty.NettyTransport > - Remote connection to [null] failed with > java.net.ConnectException: Connection refused: taskmanager-controller-vncdz/ > 172.17.0.7:6123 > > 2018-07-27 07:34:57,255 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@taskmanager-controller-vncdz:6123] has failed, address > is now gated for [50] ms. Reason: [Association failed with > [akka.tcp://flink@taskmanager-controller-vncdz:6123]] Caused by: > [Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123] > > 2018-07-27 07:34:57,256 INFO > org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not > resolve ResourceManager address > akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager, > retrying in 10000 ms: Could not connect to rpc endpoint under address > akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager.. > > 2018-07-27 07:35:07,274 WARN akka.remote.transport.netty.NettyTransport > - Remote connection to [null] failed with > java.net.ConnectException: Connection refused: taskmanager-controller-vncdz/ > 172.17.0.7:6123 > > 2018-07-27 07:35:07,276 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@taskmanager-controller-vncdz:6123] has failed, address > is now gated for [50] ms. Reason: [Association failed with > [akka.tcp://flink@taskmanager-controller-vncdz:6123]] Caused by: > [Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123] > > 2018-07-27 07:35:07,276 INFO > org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not > resolve ResourceManager address > akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager, > retrying in 10000 ms: Could not connect to rpc endpoint under address > akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager.. > > 2018-07-27 07:35:17,294 WARN akka.remote.transport.netty.NettyTransport > - Remote connection to [null] failed with > java.net.ConnectException: Connection refused: taskmanager-controller-vncdz/ > 172.17.0.7:6123 > > 2018-07-27 07:35:17,300 INFO > org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not > resolve ResourceManager address > akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager, > retrying in 10000 ms: Could not connect to rpc endpoint under address > akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager.. > > 2018-07-27 07:35:17,300 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@taskmanager-controller-vncdz:6123] has failed, address > is now gated for [50] ms. Reason: [Association failed with > [akka.tcp://flink@taskmanager-controller-vncdz:6123]] Caused by: > [Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123] > > 2018-07-27 07:35:27,315 WARN akka.remote.transport.netty.NettyTransport > - Remote connection to [null] failed with > java.net.ConnectException: Connection refused: taskmanager-controller-vncdz/ > 172.17.0.7:6123 > > 2018-07-27 07:35:27,316 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@taskmanager-controller-vncdz:6123] has failed, address > is now gated for [50] ms. Reason: [Association failed with > [akka.tcp://flink@taskmanager-controller-vncdz:6123]] Caused by: > [Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123] > > 2018-07-27 07:35:27,318 INFO > org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not > resolve ResourceManager address > akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager, > retrying in 10000 ms: Could not connect to rpc endpoint under address > akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager.. > > 2018-07-27 07:35:37,340 WARN akka.remote.transport.netty.NettyTransport > - Remote connection to [null] failed with > java.net.ConnectException: Connection refused: taskmanager-controller-vncdz/ > 172.17.0.7:6123 > > 2018-07-27 07:35:37,341 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@taskmanager-controller-vncdz:6123] has failed, address > is now gated for [50] ms. Reason: [Association failed with > [akka.tcp://flink@taskmanager-controller-vncdz:6123]] Caused by: > [Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123] > > 2018-07-27 07:35:37,343 INFO > org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not > resolve ResourceManager address > akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager, > retrying in 10000 ms: Could not connect to rpc endpoint under address > akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager.. > > 2018-07-27 07:35:47,364 WARN akka.remote.transport.netty.NettyTransport > - Remote connection to [null] failed with > java.net.ConnectException: Connection refused: taskmanager-controller-vncdz/ > 172.17.0.7:6123 > > 2018-07-27 07:35:47,365 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@taskmanager-controller-vncdz:6123] has failed, address > is now gated for [50] ms. Reason: [Association failed with > [akka.tcp://flink@taskmanager-controller-vncdz:6123]] Caused by: > [Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123] > > 2018-07-27 07:35:47,365 INFO > org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not > resolve ResourceManager address > akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager, > retrying in 10000 ms: Could not connect to rpc endpoint under address > akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager.. > > 2018-07-27 07:35:57,385 WARN akka.remote.transport.netty.NettyTransport > - Remote connection to [null] failed with > java.net.ConnectException: Connection refused: taskmanager-controller-vncdz/ > 172.17.0.7:6123 > > 2018-07-27 07:35:57,387 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@taskmanager-controller-vncdz:6123] has failed, address > is now gated for [50] ms. Reason: [Association failed with > [akka.tcp://flink@taskmanager-controller-vncdz:6123]] Caused by: > [Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123] > > 2018-07-27 07:35:57,387 INFO > org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not > resolve ResourceManager address > akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager, > retrying in 10000 ms: Could not connect to rpc endpoint under address > akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager.. > > Could anyone point me to as to what is wrong? This is my taskmanager controller <https://github.com/sedgewickmm18/flink-kubernetes/blob/master/taskmanager-controller.yaml> file. Also could someone please point me to some other docs if they exist, about running flink 1.5 end to end on kubernetes. Thanks, Vipul