Hi, Le

It looks like a DNS issue. Maybe you can try to ping or nslookup the
'my-first-flink-cluster-rest.default'
on flink operator pods to check whether dns service is normal.

Best,
Weihua


On Wed, Apr 5, 2023 at 12:43 PM Le Xu <sharonx...@gmail.com> wrote:

> Hello!
>
> I'm trying out the Kubernetes sample
> <https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/resource-providers/native_kubernetes/#starting-a-flink-session-on-kubernetes>
> described in the official doc but I am not able to submit job with the
> following error:
>
>
> -----------------------------------------------------------------------------------------------------
> org.apache.flink.client.program.ProgramInvocationException: The main
> method caused an error:
> org.apache.flink.client.deployment.ClusterRetrieveException: Could not
> create the RestClusterClient.
>         at
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)
>         at
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
>         at
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:105)
>         at
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:851)
>         at
> org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:245)
>         at
> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1095)
>         at
> org.apache.flink.client.cli.CliFrontend.lambda$mainInternal$9(CliFrontend.java:1189)
>         at
> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
>         at
> org.apache.flink.client.cli.CliFrontend.mainInternal(CliFrontend.java:1189)
>         at
> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1157)
> Caused by: java.lang.RuntimeException:
> org.apache.flink.client.deployment.ClusterRetrieveException: Could not
> create the RestClusterClient.
>         at
> org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$1(KubernetesClusterDescriptor.java:121)
>         at
> org.apache.flink.kubernetes.KubernetesClusterDescriptor.retrieve(KubernetesClusterDescriptor.java:148)
>         at
> org.apache.flink.kubernetes.KubernetesClusterDescriptor.retrieve(KubernetesClusterDescriptor.java:69)
>         at
> org.apache.flink.client.deployment.executors.AbstractSessionClusterExecutor.execute(AbstractSessionClusterExecutor.java:80)
>         at
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2197)
>         at
> org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:189)
>         at
> org.apache.flink.client.program.StreamContextEnvironment.execute(StreamContextEnvironment.java:118)
>         at
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2058)
>         at
> org.apache.flink.streaming.examples.windowing.TopSpeedWindowing.main(TopSpeedWindowing.java:154)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
>         ... 9 more
> Caused by: org.apache.flink.client.deployment.ClusterRetrieveException:
> Could not create the RestClusterClient.
>         ... 23 more
> Caused by: java.net.UnknownHostException:
> my-first-flink-cluster-rest.default: Name or service not known
>         at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
>         at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
>         at
> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
>         at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
>         at java.net.InetAddress.getAllByName(InetAddress.java:1193)
>         at java.net.InetAddress.getAllByName(InetAddress.java:1127)
>         at java.net.InetAddress.getByName(InetAddress.java:1077)
>         at
> org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.getWebMonitorAddress(HighAvailabilityServicesUtils.java:229)
>         at
> org.apache.flink.kubernetes.KubernetesClusterDescriptor.getWebMonitorAddress(KubernetesClusterDescriptor.java:140)
>         at
> org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$null$0(KubernetesClusterDescriptor.java:119)
>         at
> org.apache.flink.client.program.rest.RestClusterClient.<init>(RestClusterClient.java:237)
>         at
> org.apache.flink.client.program.rest.RestClusterClient.<init>(RestClusterClient.java:197)
>         at
> org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$1(KubernetesClusterDescriptor.java:114)
>         ... 22 more
> "
>
>
> -----------------------------------------------------------------------------------------------------
>
>
> My kubernetes service does have DNS running (see the following):
>
>
>
> -----------------------------------------------------------------------------------------------------
> root@node0:/mydata/flink-1.17.0# kubectl get pods -n kube-system
> NAME
> READY   STATUS             RESTARTS      AGE
> calico-kube-controllers-6d674b5f78-6xjv8                               0/1
>     CrashLoopBackOff   45 (9s ago)   3h33m
> calico-node-49qlx                                                      0/1
>     Running            0             3h33m
> calico-node-gds4w                                                      0/1
>     Running            0             3h33m
> calico-node-rc999                                                      0/1
>     Running            0             3h33m
> coredns-787d4945fb-76qw6                                               1/1
>     Running            0             2d4h
> coredns-787d4945fb-wwclv                                               1/1
>     Running            0             2d4h
> etcd-node0.dirigo-newfs.wisr-pg0.utah.cloudlab.us
>  1/1     Running            1 (9h ago)    2d4h
> kube-apiserver-node0.dirigo-newfs.wisr-pg0.utah.cloudlab.us
>  1/1     Running            35 (9h ago)   2d4h
> kube-controller-manager-node0.dirigo-newfs.wisr-pg0.utah.cloudlab.us
> 1/1     Running            4 (9h ago)    2d4h
> kube-proxy-8g6zk                                                       1/1
>     Running            1 (9h ago)    2d4h
> kube-proxy-p2ph9                                                       1/1
>     Running            0             7h47m
> kube-proxy-w2whd                                                       1/1
>     Running            0             7h41m
> kube-scheduler-node0.dirigo-newfs.wisr-pg0.utah.cloudlab.us
>  1/1     Running            4 (9h ago)    2d4h
>
> -----------------------------------------------------------------------------------------------------
>
> And my service  appears to be running normally (I'm using my own cluster,
> changing the exposure type to NodePort produces the similar error):
>
>
> -----------------------------------------------------------------------------------------------------
> root@node0:/mydata/flink-1.17.0# kubectl get services
> NAME                          TYPE        CLUSTER-IP     EXTERNAL-IP
> PORT(S)             AGE
> kubernetes                    ClusterIP   10.96.0.1      <none>
>  443/TCP             2d4h
> my-first-flink-cluster        ClusterIP   None           <none>
>  6123/TCP,6124/TCP   60s
> my-first-flink-cluster-rest   ClusterIP   10.98.42.188   <none>
>  8081/TCP            60s
>
> -----------------------------------------------------------------------------------------------------
>
> Any suggestions on what might be going on with my setup?
>
> Thanks!
>
> Le
>
>

Reply via email to