Sorry i forget that the JobManager is binding its rpc address to flink-jobmanager, not the ip address. So you need to also update the jobmanager-session-deployment.yaml with following changes.
... containers: - name: jobmanager env: - name: JM_IP valueFrom: fieldRef: apiVersion: v1 fieldPath: status.podIP image: flink:1.11 args: ["jobmanager", "$(JM_IP)"] ... After then the JobManager is binding the rpc address with its ip. Best, Yang superainbower <superainbo...@163.com> 于2020年9月3日周四 上午11:38写道: > HI Yang, > I update taskmanager-session-deployment.yaml like this: > > apiVersion: apps/v1 > kind: Deployment > metadata: > name: flink-taskmanager > spec: > replicas: 1 > selector: > matchLabels: > app: flink > component: taskmanager > template: > metadata: > labels: > app: flink > component: taskmanager > spec: > containers: > - name: taskmanager > image: > registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1 > args: ["taskmanager","-Djobmanager.rpc.address=172.18.0.5"] > ports: > - containerPort: 6122 > name: rpc > - containerPort: 6125 > name: query-state > livenessProbe: > tcpSocket: > port: 6122 > initialDelaySeconds: 30 > periodSeconds: 60 > volumeMounts: > - name: flink-config-volume > mountPath: /opt/flink/conf/ > securityContext: > runAsUser: 9999 # refers to user _flink_ from official flink > image, change if necessary > volumes: > - name: flink-config-volume > configMap: > name: flink-config > items: > - key: flink-conf.yaml > path: flink-conf.yaml > - key: log4j-console.properties > path: log4j-console.properties > imagePullSecrets: > - name: regcred > > And Delete the TaskManager pod and restart it , but the logs print this > > Could not resolve ResourceManager address akka.tcp:// > flink@172.18.0.5:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: > Could not connect to rpc endpoint under address akka.tcp:// > flink@172.18.0.5:6123/user/rpc/resourcemanager_* > > It change flink-jobmanager to 172.18.0.5 > superainbower > superainbo...@163.com > > <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=superainbower&uid=superainbower%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22superainbower%40163.com%22%5D> > 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制 > > On 09/3/2020 11:09,Yang Wang<danrtsey...@gmail.com> > <danrtsey...@gmail.com> wrote: > > I guess something is wrong with your kube proxy, which causes TaskManager > could not connect to JobManager. > You could verify this by directly using JobManager Pod ip instead of > service name. > > Please do as follows. > * Edit the TaskManager deployment(via kubectl edit flink-taskmanager) and > update the args field to the following. > args: ["taskmanager", "-Djobmanager.rpc.address=172.18.0.5"] Given > that "172.18.0.5" is the JobManager pod ip. > * Delete the current TaskManager pod and let restart again > * Now check the TaskManager logs to check whether it could register > successfully > > > > Best, > Yang > > superainbower <superainbo...@163.com> 于2020年9月3日周四 上午9:35写道: > >> Hi Till, >> I find something may be helpful. >> The kubernetes Dashboard show job-manager ip 172.18.0.5, task-manager ip >> 172.18.0.6 >> When I run command 'kubectl exec -ti flink-taskmanager-74c68c6f48-jqpbn >> -- /bin/bash’ && ‘ping 172.18.0.5’ >> I can get response >> But when I ping flink-jobmanager ,there is no response >> >> superainbower >> superainbo...@163.com >> >> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=superainbower&uid=superainbower%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22superainbower%40163.com%22%5D> >> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制 >> >> On 09/3/2020 09:03,superainbower<superainbo...@163.com> >> <superainbo...@163.com> wrote: >> >> Hi Till, >> This is the taskManager log >> As you see, the logs print ‘line 92 -- Could not connect to >> flink-jobmanager:6123’ >> then print ‘line 128 --Could not resolve ResourceManager address >> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, >> retrying in 10000 ms: Could not connect to rpc endpoint under address >> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.’ >> And repeat print this >> >> A few minutes later, the taskmanger shut down and restart >> >> This is my yaml files, could u help me to confirm did I >> omitted something? Thanks a lot! >> --------------------------------------------------- >> flink-configuration-configmap.yaml >> apiVersion: v1 >> kind: ConfigMap >> metadata: >> name: flink-config >> labels: >> app: flink >> data: >> flink-conf.yaml: |+ >> jobmanager.rpc.address: flink-jobmanager >> taskmanager.numberOfTaskSlots: 1 >> blob.server.port: 6124 >> jobmanager.rpc.port: 6123 >> taskmanager.rpc.port: 6122 >> queryable-state.proxy.ports: 6125 >> jobmanager.memory.process.size: 1024m >> taskmanager.memory.process.size: 1024m >> parallelism.default: 1 >> log4j-console.properties: |+ >> rootLogger.level = INFO >> rootLogger.appenderRef.console.ref = ConsoleAppender >> rootLogger.appenderRef.rolling.ref = RollingFileAppender >> logger.akka.name = akka >> logger.akka.level = INFO >> logger.kafka.name= org.apache.kafka >> logger.kafka.level = INFO >> logger.hadoop.name = org.apache.hadoop >> logger.hadoop.level = INFO >> logger.zookeeper.name = org.apache.zookeeper >> logger.zookeeper.level = INFO >> appender.console.name = ConsoleAppender >> appender.console.type = CONSOLE >> appender.console.layout.type = PatternLayout >> appender.console.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p >> %-60c %x - %m%n >> appender.rolling.name = RollingFileAppender >> appender.rolling.type = RollingFile >> appender.rolling.append = false >> appender.rolling.fileName = ${sys:log.file} >> appender.rolling.filePattern = ${sys:log.file}.%i >> appender.rolling.layout.type = PatternLayout >> appender.rolling.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p >> %-60c %x - %m%n >> appender.rolling.policies.type = Policies >> appender.rolling.policies.size.type = SizeBasedTriggeringPolicy >> appender.rolling.policies.size.size=100MB >> appender.rolling.strategy.type = DefaultRolloverStrategy >> appender.rolling.strategy.max = 10 >> logger.netty.name = >> org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline >> logger.netty.level = OFF >> --------------------------------------------------- >> jobmanager-service.yaml >> apiVersion: v1 >> kind: Service >> metadata: >> name: flink-jobmanager >> spec: >> type: ClusterIP >> ports: >> - name: rpc >> port: 6123 >> - name: blob-server >> port: 6124 >> - name: webui >> port: 8081 >> selector: >> app: flink >> component: jobmanager >> -------------------------------------------------- >> jobmanager-session-deployment.yaml >> apiVersion: apps/v1 >> kind: Deployment >> metadata: >> name: flink-jobmanager >> spec: >> replicas: 1 >> selector: >> matchLabels: >> app: flink >> component: jobmanager >> template: >> metadata: >> labels: >> app: flink >> component: jobmanager >> spec: >> containers: >> - name: jobmanager >> image: >> registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1 >> args: ["jobmanager"] >> ports: >> - containerPort: 6123 >> name: rpc >> - containerPort: 6124 >> name: blob-server >> - containerPort: 8081 >> name: webui >> livenessProbe: >> tcpSocket: >> port: 6123 >> initialDelaySeconds: 30 >> periodSeconds: 60 >> volumeMounts: >> - name: flink-config-volume >> mountPath: /opt/flink/conf >> securityContext: >> runAsUser: 9999 # refers to user _flink_ from official flink >> image, change if necessary >> volumes: >> - name: flink-config-volume >> configMap: >> name: flink-config >> items: >> - key: flink-conf.yaml >> path: flink-conf.yaml >> - key: log4j-console.properties >> path: log4j-console.properties >> imagePullSecrets: >> - name: regcred >> --------------------------------------------------- >> taskmanager-session-deployment.yaml >> apiVersion: apps/v1 >> kind: Deployment >> metadata: >> name: flink-taskmanager >> spec: >> replicas: 1 >> selector: >> matchLabels: >> app: flink >> component: taskmanager >> template: >> metadata: >> labels: >> app: flink >> component: taskmanager >> spec: >> containers: >> - name: taskmanager >> image: >> registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1 >> args: ["taskmanager"] >> ports: >> - containerPort: 6122 >> name: rpc >> - containerPort: 6125 >> name: query-state >> livenessProbe: >> tcpSocket: >> port: 6122 >> initialDelaySeconds: 30 >> periodSeconds: 60 >> volumeMounts: >> - name: flink-config-volume >> mountPath: /opt/flink/conf/ >> securityContext: >> runAsUser: 9999 # refers to user _flink_ from official flink >> image, change if necessary >> volumes: >> - name: flink-config-volume >> configMap: >> name: flink-config >> items: >> - key: flink-conf.yaml >> path: flink-conf.yaml >> - key: log4j-console.properties >> path: log4j-console.properties >> imagePullSecrets: >> - name: regcred >> >> >> superainbower >> superainbo...@163.com >> >> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=superainbower&uid=superainbower%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22superainbower%40163.com%22%5D> >> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制 >> >> On 09/2/2020 20:38,Till Rohrmann<trohrm...@apache.org> >> <trohrm...@apache.org> wrote: >> >> Hmm, this is indeed strange. Could you share the logs of the TaskManager >> with us? Ideally you set the log level to debug. Thanks a lot. >> >> Cheers, >> Till >> >> On Wed, Sep 2, 2020 at 12:45 PM art <superainbo...@163.com> wrote: >> >>> Hi Till, >>> >>> The full information when I run command ' kubectl get all’ like this: >>> >>> NAME READY STATUS RESTARTS AGE >>> pod/flink-jobmanager-85bdbd98d8-ppjmf 1/1 Running 0 >>> 2m34s >>> pod/flink-taskmanager-74c68c6f48-6jb5v 1/1 Running 0 >>> 2m34s >>> >>> NAME TYPE CLUSTER-IP EXTERNAL-IP >>> PORT(S) AGE >>> service/flink-jobmanager ClusterIP 10.103.207.75 <none> >>> 6123/TCP,6124/TCP,8081/TCP 2m34s >>> service/kubernetes ClusterIP 10.96.0.1 <none> >>> 443/TCP 5d2h >>> >>> NAME READY UP-TO-DATE AVAILABLE AGE >>> deployment.apps/flink-jobmanager 1/1 1 1 >>> 2m34s >>> deployment.apps/flink-taskmanager 1/1 1 1 >>> 2m34s >>> >>> NAME DESIRED CURRENT READY >>> AGE >>> replicaset.apps/flink-jobmanager-85bdbd98d8 1 1 1 >>> 2m34s >>> replicaset.apps/flink-taskmanager-74c68c6f48 1 1 1 >>> 2m34s >>> >>> And I can open flink ui but the task manger is 0 ,so the job manger is >>> work well >>> I think the problem is taksmanger can not register itself to jobmanger, >>> did I miss some configure? >>> >>> >>> 在 2020年9月2日,下午5:24,Till Rohrmann <trohrm...@apache.org> 写道: >>> >>> Hi art, >>> >>> could you check what `kubectl get services` returns? Usually if you run >>> `kubectl get all` you should also see the services. But in your case there >>> are no services listed. You have see something like >>> service/flink-jobmanager otherwise the flink-jobmanager service (K8s >>> service) is not running. >>> >>> Cheers, >>> Till >>> >>> On Wed, Sep 2, 2020 at 11:15 AM art <superainbo...@163.com> wrote: >>> >>>> Hi Till, >>>> >>>> I’m sure the job manager-service is started, I can find it in >>>> Kubernetes DashBoard >>>> >>>> When I run command ' kubectl get deployment’ I can got this: >>>> flink-jobmanager 1/1 1 1 33s >>>> flink-taskmanager 1/1 1 1 33s >>>> >>>> When I run command ' kubectl get all’ I can got this: >>>> NAME READY STATUS RESTARTS >>>> AGE >>>> pod/flink-jobmanager-85bdbd98d8-ppjmf 1/1 Running 0 >>>> 2m34s >>>> pod/flink-taskmanager-74c68c6f48-6jb5v 1/1 Running 0 >>>> 2m34s >>>> >>>> So, I think flink-jobmanager works well, but taskmannger is restarted >>>> every few minutes >>>> >>>> My minikube version: v1.12.3 >>>> Flink version:v1.11.1 >>>> >>>> 在 2020年9月2日,下午4:27,Till Rohrmann <trohrm...@apache.org> 写道: >>>> >>>> Hi art, >>>> >>>> could you verify that the jobmanager-service has been started? It looks >>>> as if the name flink-jobmanager is not resolvable. It could also help to >>>> know the Minikube and K8s version you are using. >>>> >>>> Cheers, >>>> Till >>>> >>>> On Wed, Sep 2, 2020 at 9:50 AM art <superainbo...@163.com> wrote: >>>> >>>>> Hi,I’m going to deploy flink on minikube referring to >>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/zh/ops/deployment/kubernetes.html >>>>> ; >>>>> kubectl create -f flink-configuration-configmap.yaml >>>>> kubectl create -f jobmanager-service.yaml >>>>> kubectl create -f jobmanager-session-deployment.yaml >>>>> kubectl create -f taskmanager-session-deployment.yaml >>>>> >>>>> But I got this >>>>> >>>>> 2020-09-02 06:45:42,664 WARN akka.remote.ReliableDeliverySupervisor >>>>> [] - Association with remote system [ >>>>> akka.tcp://flink@flink-jobmanager:6123] has failed, address is now >>>>> gated for [50] ms. Reason: [Association failed with [ >>>>> akka.tcp://flink@flink-jobmanager:6123]] Caused by: >>>>> [java.net.UnknownHostException: flink-jobmanager: Temporary failure in >>>>> name >>>>> resolution] >>>>> 2020-09-02 06:45:42,691 INFO >>>>> org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could >>>>> not resolve ResourceManager address >>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, >>>>> retrying in 10000 ms: Could not connect to rpc endpoint under address >>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*. >>>>> 2020-09-02 06:46:02,731 INFO >>>>> org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could >>>>> not resolve ResourceManager address >>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, >>>>> retrying in 10000 ms: Could not connect to rpc endpoint under address >>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*. >>>>> 2020-09-02 06:46:12,731 INFO akka.remote.transport.ProtocolStateActor >>>>> [] - No response from remote for outbound association. >>>>> Associate timed out after [20000 ms]. >>>>> >>>>> And when I run the command 'kubectl exec -ti >>>>> flink-taskmanager-74c68c6f48-9tkvd -- /bin/bash’ && ‘ping >>>>> flink-jobmanager’ >>>>> , I find I cannot ping flink-jobmanager from taskmanager >>>>> >>>>> I am new to k8s, can anyone give me some tutorial? Thanks a lot ! >>>>> >>>> >>>> >>>