Sorry i forget that the JobManager is binding its rpc address to
flink-jobmanager, not the ip address.
So you need to also update the jobmanager-session-deployment.yaml with
following changes.
...
containers:
- name: jobmanager
env:
- name: JM_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
image: flink:1.11
args: ["jobmanager", "$(JM_IP)"]
...
After then the JobManager is binding the rpc address with its ip.
Best,
Yang
superainbower <[email protected]> 于2020年9月3日周四 上午11:38写道:
> HI Yang,
> I update taskmanager-session-deployment.yaml like this:
>
> apiVersion: apps/v1
> kind: Deployment
> metadata:
> name: flink-taskmanager
> spec:
> replicas: 1
> selector:
> matchLabels:
> app: flink
> component: taskmanager
> template:
> metadata:
> labels:
> app: flink
> component: taskmanager
> spec:
> containers:
> - name: taskmanager
> image:
> registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1
> args: ["taskmanager","-Djobmanager.rpc.address=172.18.0.5"]
> ports:
> - containerPort: 6122
> name: rpc
> - containerPort: 6125
> name: query-state
> livenessProbe:
> tcpSocket:
> port: 6122
> initialDelaySeconds: 30
> periodSeconds: 60
> volumeMounts:
> - name: flink-config-volume
> mountPath: /opt/flink/conf/
> securityContext:
> runAsUser: 9999 # refers to user _flink_ from official flink
> image, change if necessary
> volumes:
> - name: flink-config-volume
> configMap:
> name: flink-config
> items:
> - key: flink-conf.yaml
> path: flink-conf.yaml
> - key: log4j-console.properties
> path: log4j-console.properties
> imagePullSecrets:
> - name: regcred
>
> And Delete the TaskManager pod and restart it , but the logs print this
>
> Could not resolve ResourceManager address akka.tcp://
> [email protected]:6123/user/rpc/resourcemanager_*, retrying in 10000 ms:
> Could not connect to rpc endpoint under address akka.tcp://
> [email protected]:6123/user/rpc/resourcemanager_*
>
> It change flink-jobmanager to 172.18.0.5
> superainbower
> [email protected]
>
> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=superainbower&uid=superainbower%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22superainbower%40163.com%22%5D>
> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制
>
> On 09/3/2020 11:09,Yang Wang<[email protected]>
> <[email protected]> wrote:
>
> I guess something is wrong with your kube proxy, which causes TaskManager
> could not connect to JobManager.
> You could verify this by directly using JobManager Pod ip instead of
> service name.
>
> Please do as follows.
> * Edit the TaskManager deployment(via kubectl edit flink-taskmanager) and
> update the args field to the following.
> args: ["taskmanager", "-Djobmanager.rpc.address=172.18.0.5"] Given
> that "172.18.0.5" is the JobManager pod ip.
> * Delete the current TaskManager pod and let restart again
> * Now check the TaskManager logs to check whether it could register
> successfully
>
>
>
> Best,
> Yang
>
> superainbower <[email protected]> 于2020年9月3日周四 上午9:35写道:
>
>> Hi Till,
>> I find something may be helpful.
>> The kubernetes Dashboard show job-manager ip 172.18.0.5, task-manager ip
>> 172.18.0.6
>> When I run command 'kubectl exec -ti flink-taskmanager-74c68c6f48-jqpbn
>> -- /bin/bash’ && ‘ping 172.18.0.5’
>> I can get response
>> But when I ping flink-jobmanager ,there is no response
>>
>> superainbower
>> [email protected]
>>
>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=superainbower&uid=superainbower%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22superainbower%40163.com%22%5D>
>> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制
>>
>> On 09/3/2020 09:03,superainbower<[email protected]>
>> <[email protected]> wrote:
>>
>> Hi Till,
>> This is the taskManager log
>> As you see, the logs print ‘line 92 -- Could not connect to
>> flink-jobmanager:6123’
>> then print ‘line 128 --Could not resolve ResourceManager address
>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*,
>> retrying in 10000 ms: Could not connect to rpc endpoint under address
>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.’
>> And repeat print this
>>
>> A few minutes later, the taskmanger shut down and restart
>>
>> This is my yaml files, could u help me to confirm did I
>> omitted something? Thanks a lot!
>> ---------------------------------------------------
>> flink-configuration-configmap.yaml
>> apiVersion: v1
>> kind: ConfigMap
>> metadata:
>> name: flink-config
>> labels:
>> app: flink
>> data:
>> flink-conf.yaml: |+
>> jobmanager.rpc.address: flink-jobmanager
>> taskmanager.numberOfTaskSlots: 1
>> blob.server.port: 6124
>> jobmanager.rpc.port: 6123
>> taskmanager.rpc.port: 6122
>> queryable-state.proxy.ports: 6125
>> jobmanager.memory.process.size: 1024m
>> taskmanager.memory.process.size: 1024m
>> parallelism.default: 1
>> log4j-console.properties: |+
>> rootLogger.level = INFO
>> rootLogger.appenderRef.console.ref = ConsoleAppender
>> rootLogger.appenderRef.rolling.ref = RollingFileAppender
>> logger.akka.name = akka
>> logger.akka.level = INFO
>> logger.kafka.name= org.apache.kafka
>> logger.kafka.level = INFO
>> logger.hadoop.name = org.apache.hadoop
>> logger.hadoop.level = INFO
>> logger.zookeeper.name = org.apache.zookeeper
>> logger.zookeeper.level = INFO
>> appender.console.name = ConsoleAppender
>> appender.console.type = CONSOLE
>> appender.console.layout.type = PatternLayout
>> appender.console.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p
>> %-60c %x - %m%n
>> appender.rolling.name = RollingFileAppender
>> appender.rolling.type = RollingFile
>> appender.rolling.append = false
>> appender.rolling.fileName = ${sys:log.file}
>> appender.rolling.filePattern = ${sys:log.file}.%i
>> appender.rolling.layout.type = PatternLayout
>> appender.rolling.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p
>> %-60c %x - %m%n
>> appender.rolling.policies.type = Policies
>> appender.rolling.policies.size.type = SizeBasedTriggeringPolicy
>> appender.rolling.policies.size.size=100MB
>> appender.rolling.strategy.type = DefaultRolloverStrategy
>> appender.rolling.strategy.max = 10
>> logger.netty.name =
>> org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline
>> logger.netty.level = OFF
>> ---------------------------------------------------
>> jobmanager-service.yaml
>> apiVersion: v1
>> kind: Service
>> metadata:
>> name: flink-jobmanager
>> spec:
>> type: ClusterIP
>> ports:
>> - name: rpc
>> port: 6123
>> - name: blob-server
>> port: 6124
>> - name: webui
>> port: 8081
>> selector:
>> app: flink
>> component: jobmanager
>> --------------------------------------------------
>> jobmanager-session-deployment.yaml
>> apiVersion: apps/v1
>> kind: Deployment
>> metadata:
>> name: flink-jobmanager
>> spec:
>> replicas: 1
>> selector:
>> matchLabels:
>> app: flink
>> component: jobmanager
>> template:
>> metadata:
>> labels:
>> app: flink
>> component: jobmanager
>> spec:
>> containers:
>> - name: jobmanager
>> image:
>> registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1
>> args: ["jobmanager"]
>> ports:
>> - containerPort: 6123
>> name: rpc
>> - containerPort: 6124
>> name: blob-server
>> - containerPort: 8081
>> name: webui
>> livenessProbe:
>> tcpSocket:
>> port: 6123
>> initialDelaySeconds: 30
>> periodSeconds: 60
>> volumeMounts:
>> - name: flink-config-volume
>> mountPath: /opt/flink/conf
>> securityContext:
>> runAsUser: 9999 # refers to user _flink_ from official flink
>> image, change if necessary
>> volumes:
>> - name: flink-config-volume
>> configMap:
>> name: flink-config
>> items:
>> - key: flink-conf.yaml
>> path: flink-conf.yaml
>> - key: log4j-console.properties
>> path: log4j-console.properties
>> imagePullSecrets:
>> - name: regcred
>> ---------------------------------------------------
>> taskmanager-session-deployment.yaml
>> apiVersion: apps/v1
>> kind: Deployment
>> metadata:
>> name: flink-taskmanager
>> spec:
>> replicas: 1
>> selector:
>> matchLabels:
>> app: flink
>> component: taskmanager
>> template:
>> metadata:
>> labels:
>> app: flink
>> component: taskmanager
>> spec:
>> containers:
>> - name: taskmanager
>> image:
>> registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1
>> args: ["taskmanager"]
>> ports:
>> - containerPort: 6122
>> name: rpc
>> - containerPort: 6125
>> name: query-state
>> livenessProbe:
>> tcpSocket:
>> port: 6122
>> initialDelaySeconds: 30
>> periodSeconds: 60
>> volumeMounts:
>> - name: flink-config-volume
>> mountPath: /opt/flink/conf/
>> securityContext:
>> runAsUser: 9999 # refers to user _flink_ from official flink
>> image, change if necessary
>> volumes:
>> - name: flink-config-volume
>> configMap:
>> name: flink-config
>> items:
>> - key: flink-conf.yaml
>> path: flink-conf.yaml
>> - key: log4j-console.properties
>> path: log4j-console.properties
>> imagePullSecrets:
>> - name: regcred
>>
>>
>> superainbower
>> [email protected]
>>
>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=superainbower&uid=superainbower%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22superainbower%40163.com%22%5D>
>> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制
>>
>> On 09/2/2020 20:38,Till Rohrmann<[email protected]>
>> <[email protected]> wrote:
>>
>> Hmm, this is indeed strange. Could you share the logs of the TaskManager
>> with us? Ideally you set the log level to debug. Thanks a lot.
>>
>> Cheers,
>> Till
>>
>> On Wed, Sep 2, 2020 at 12:45 PM art <[email protected]> wrote:
>>
>>> Hi Till,
>>>
>>> The full information when I run command ' kubectl get all’ like this:
>>>
>>> NAME READY STATUS RESTARTS AGE
>>> pod/flink-jobmanager-85bdbd98d8-ppjmf 1/1 Running 0
>>> 2m34s
>>> pod/flink-taskmanager-74c68c6f48-6jb5v 1/1 Running 0
>>> 2m34s
>>>
>>> NAME TYPE CLUSTER-IP EXTERNAL-IP
>>> PORT(S) AGE
>>> service/flink-jobmanager ClusterIP 10.103.207.75 <none>
>>> 6123/TCP,6124/TCP,8081/TCP 2m34s
>>> service/kubernetes ClusterIP 10.96.0.1 <none>
>>> 443/TCP 5d2h
>>>
>>> NAME READY UP-TO-DATE AVAILABLE AGE
>>> deployment.apps/flink-jobmanager 1/1 1 1
>>> 2m34s
>>> deployment.apps/flink-taskmanager 1/1 1 1
>>> 2m34s
>>>
>>> NAME DESIRED CURRENT READY
>>> AGE
>>> replicaset.apps/flink-jobmanager-85bdbd98d8 1 1 1
>>> 2m34s
>>> replicaset.apps/flink-taskmanager-74c68c6f48 1 1 1
>>> 2m34s
>>>
>>> And I can open flink ui but the task manger is 0 ,so the job manger is
>>> work well
>>> I think the problem is taksmanger can not register itself to jobmanger,
>>> did I miss some configure?
>>>
>>>
>>> 在 2020年9月2日,下午5:24,Till Rohrmann <[email protected]> 写道:
>>>
>>> Hi art,
>>>
>>> could you check what `kubectl get services` returns? Usually if you run
>>> `kubectl get all` you should also see the services. But in your case there
>>> are no services listed. You have see something like
>>> service/flink-jobmanager otherwise the flink-jobmanager service (K8s
>>> service) is not running.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Wed, Sep 2, 2020 at 11:15 AM art <[email protected]> wrote:
>>>
>>>> Hi Till,
>>>>
>>>> I’m sure the job manager-service is started, I can find it in
>>>> Kubernetes DashBoard
>>>>
>>>> When I run command ' kubectl get deployment’ I can got this:
>>>> flink-jobmanager 1/1 1 1 33s
>>>> flink-taskmanager 1/1 1 1 33s
>>>>
>>>> When I run command ' kubectl get all’ I can got this:
>>>> NAME READY STATUS RESTARTS
>>>> AGE
>>>> pod/flink-jobmanager-85bdbd98d8-ppjmf 1/1 Running 0
>>>> 2m34s
>>>> pod/flink-taskmanager-74c68c6f48-6jb5v 1/1 Running 0
>>>> 2m34s
>>>>
>>>> So, I think flink-jobmanager works well, but taskmannger is restarted
>>>> every few minutes
>>>>
>>>> My minikube version: v1.12.3
>>>> Flink version:v1.11.1
>>>>
>>>> 在 2020年9月2日,下午4:27,Till Rohrmann <[email protected]> 写道:
>>>>
>>>> Hi art,
>>>>
>>>> could you verify that the jobmanager-service has been started? It looks
>>>> as if the name flink-jobmanager is not resolvable. It could also help to
>>>> know the Minikube and K8s version you are using.
>>>>
>>>> Cheers,
>>>> Till
>>>>
>>>> On Wed, Sep 2, 2020 at 9:50 AM art <[email protected]> wrote:
>>>>
>>>>> Hi,I’m going to deploy flink on minikube referring to
>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/zh/ops/deployment/kubernetes.html
>>>>> ;
>>>>> kubectl create -f flink-configuration-configmap.yaml
>>>>> kubectl create -f jobmanager-service.yaml
>>>>> kubectl create -f jobmanager-session-deployment.yaml
>>>>> kubectl create -f taskmanager-session-deployment.yaml
>>>>>
>>>>> But I got this
>>>>>
>>>>> 2020-09-02 06:45:42,664 WARN akka.remote.ReliableDeliverySupervisor
>>>>> [] - Association with remote system [
>>>>> akka.tcp://flink@flink-jobmanager:6123] has failed, address is now
>>>>> gated for [50] ms. Reason: [Association failed with [
>>>>> akka.tcp://flink@flink-jobmanager:6123]] Caused by:
>>>>> [java.net.UnknownHostException: flink-jobmanager: Temporary failure in
>>>>> name
>>>>> resolution]
>>>>> 2020-09-02 06:45:42,691 INFO
>>>>> org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could
>>>>> not resolve ResourceManager address
>>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*,
>>>>> retrying in 10000 ms: Could not connect to rpc endpoint under address
>>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.
>>>>> 2020-09-02 06:46:02,731 INFO
>>>>> org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could
>>>>> not resolve ResourceManager address
>>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*,
>>>>> retrying in 10000 ms: Could not connect to rpc endpoint under address
>>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.
>>>>> 2020-09-02 06:46:12,731 INFO akka.remote.transport.ProtocolStateActor
>>>>> [] - No response from remote for outbound association.
>>>>> Associate timed out after [20000 ms].
>>>>>
>>>>> And when I run the command 'kubectl exec -ti
>>>>> flink-taskmanager-74c68c6f48-9tkvd -- /bin/bash’ && ‘ping
>>>>> flink-jobmanager’
>>>>> , I find I cannot ping flink-jobmanager from taskmanager
>>>>>
>>>>> I am new to k8s, can anyone give me some tutorial? Thanks a lot !
>>>>>
>>>>
>>>>
>>>