Thanks -- I fixed the DNS setup and it solved the problem. Le
On Thu, Apr 6, 2023 at 12:19 AM Weihua Hu <huweihua....@gmail.com> wrote: > Hi, Le > > It looks like a DNS issue. Maybe you can try to ping or nslookup the > 'my-first-flink-cluster-rest.default' > on flink operator pods to check whether dns service is normal. > > Best, > Weihua > > > On Wed, Apr 5, 2023 at 12:43 PM Le Xu <sharonx...@gmail.com> wrote: > >> Hello! >> >> I'm trying out the Kubernetes sample >> <https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/resource-providers/native_kubernetes/#starting-a-flink-session-on-kubernetes> >> described in the official doc but I am not able to submit job with the >> following error: >> >> >> ----------------------------------------------------------------------------------------------------- >> org.apache.flink.client.program.ProgramInvocationException: The main >> method caused an error: >> org.apache.flink.client.deployment.ClusterRetrieveException: Could not >> create the RestClusterClient. >> at >> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372) >> at >> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222) >> at >> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:105) >> at >> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:851) >> at >> org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:245) >> at >> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1095) >> at >> org.apache.flink.client.cli.CliFrontend.lambda$mainInternal$9(CliFrontend.java:1189) >> at >> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28) >> at >> org.apache.flink.client.cli.CliFrontend.mainInternal(CliFrontend.java:1189) >> at >> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1157) >> Caused by: java.lang.RuntimeException: >> org.apache.flink.client.deployment.ClusterRetrieveException: Could not >> create the RestClusterClient. >> at >> org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$1(KubernetesClusterDescriptor.java:121) >> at >> org.apache.flink.kubernetes.KubernetesClusterDescriptor.retrieve(KubernetesClusterDescriptor.java:148) >> at >> org.apache.flink.kubernetes.KubernetesClusterDescriptor.retrieve(KubernetesClusterDescriptor.java:69) >> at >> org.apache.flink.client.deployment.executors.AbstractSessionClusterExecutor.execute(AbstractSessionClusterExecutor.java:80) >> at >> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2197) >> at >> org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:189) >> at >> org.apache.flink.client.program.StreamContextEnvironment.execute(StreamContextEnvironment.java:118) >> at >> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2058) >> at >> org.apache.flink.streaming.examples.windowing.TopSpeedWindowing.main(TopSpeedWindowing.java:154) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:498) >> at >> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355) >> ... 9 more >> Caused by: org.apache.flink.client.deployment.ClusterRetrieveException: >> Could not create the RestClusterClient. >> ... 23 more >> Caused by: java.net.UnknownHostException: >> my-first-flink-cluster-rest.default: Name or service not known >> at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) >> at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) >> at >> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) >> at java.net.InetAddress.getAllByName0(InetAddress.java:1277) >> at java.net.InetAddress.getAllByName(InetAddress.java:1193) >> at java.net.InetAddress.getAllByName(InetAddress.java:1127) >> at java.net.InetAddress.getByName(InetAddress.java:1077) >> at >> org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.getWebMonitorAddress(HighAvailabilityServicesUtils.java:229) >> at >> org.apache.flink.kubernetes.KubernetesClusterDescriptor.getWebMonitorAddress(KubernetesClusterDescriptor.java:140) >> at >> org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$null$0(KubernetesClusterDescriptor.java:119) >> at >> org.apache.flink.client.program.rest.RestClusterClient.<init>(RestClusterClient.java:237) >> at >> org.apache.flink.client.program.rest.RestClusterClient.<init>(RestClusterClient.java:197) >> at >> org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$1(KubernetesClusterDescriptor.java:114) >> ... 22 more >> " >> >> >> ----------------------------------------------------------------------------------------------------- >> >> >> My kubernetes service does have DNS running (see the following): >> >> >> >> ----------------------------------------------------------------------------------------------------- >> root@node0:/mydata/flink-1.17.0# kubectl get pods -n kube-system >> NAME >> READY STATUS RESTARTS AGE >> calico-kube-controllers-6d674b5f78-6xjv8 >> 0/1 CrashLoopBackOff 45 (9s ago) 3h33m >> calico-node-49qlx >> 0/1 Running 0 3h33m >> calico-node-gds4w >> 0/1 Running 0 3h33m >> calico-node-rc999 >> 0/1 Running 0 3h33m >> coredns-787d4945fb-76qw6 >> 1/1 Running 0 2d4h >> coredns-787d4945fb-wwclv >> 1/1 Running 0 2d4h >> etcd-node0.dirigo-newfs.wisr-pg0.utah.cloudlab.us >> 1/1 Running 1 (9h ago) 2d4h >> kube-apiserver-node0.dirigo-newfs.wisr-pg0.utah.cloudlab.us >> 1/1 Running 35 (9h ago) 2d4h >> kube-controller-manager-node0.dirigo-newfs.wisr-pg0.utah.cloudlab.us >> 1/1 Running 4 (9h ago) 2d4h >> kube-proxy-8g6zk >> 1/1 Running 1 (9h ago) 2d4h >> kube-proxy-p2ph9 >> 1/1 Running 0 7h47m >> kube-proxy-w2whd >> 1/1 Running 0 7h41m >> kube-scheduler-node0.dirigo-newfs.wisr-pg0.utah.cloudlab.us >> 1/1 Running 4 (9h ago) 2d4h >> >> ----------------------------------------------------------------------------------------------------- >> >> And my service appears to be running normally (I'm using my own cluster, >> changing the exposure type to NodePort produces the similar error): >> >> >> ----------------------------------------------------------------------------------------------------- >> root@node0:/mydata/flink-1.17.0# kubectl get services >> NAME TYPE CLUSTER-IP EXTERNAL-IP >> PORT(S) AGE >> kubernetes ClusterIP 10.96.0.1 <none> >> 443/TCP 2d4h >> my-first-flink-cluster ClusterIP None <none> >> 6123/TCP,6124/TCP 60s >> my-first-flink-cluster-rest ClusterIP 10.98.42.188 <none> >> 8081/TCP 60s >> >> ----------------------------------------------------------------------------------------------------- >> >> Any suggestions on what might be going on with my setup? >> >> Thanks! >> >> Le >> >>