[ 
https://issues.apache.org/jira/browse/FLINK-21472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17295887#comment-17295887
 ] 

Peng Zhang commented on FLINK-21472:
------------------------------------

[~fly_in_gis] Thanks! I will try Flink 1.12.2 once it is available in docker. 
For more information, in our case the FencingTokenException happened when a 
JobManager is redeployed to another node by K8S. And the new JobManager cannot 
start the jobs from checkpoints

{{2021-03-04 17:04:44,928 INFO  
org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] - 
Recovering checkpoints from 
KubernetesStateHandleStore\{configMapName='stellar-flink-cluster-8ea8bb860bdefc3884cd586f4473295a-jobmanager-leader'}.
 2021-03-04 17:04:44,928 INFO  
org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] - 
Recovering checkpoints from 
KubernetesStateHandleStore\{configMapName='stellar-flink-cluster-8ea8bb860bdefc3884cd586f4473295a-jobmanager-leader'}.
 2021-03-04 17:04:44,933 INFO  
org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] - Found 
1 checkpoints in 
KubernetesStateHandleStore\{configMapName='stellar-flink-cluster-8ea8bb860bdefc3884cd586f4473295a-jobmanager-leader'}.
 2021-03-04 17:04:44,933 INFO  
org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] - Trying 
to fetch 1 checkpoints from storage. 2021-03-04 17:04:44,933 INFO  
org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] - Trying 
to retrieve checkpoint 18. 2021-03-04 17:04:44,963 INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Restoring job 
8ea8bb860bdefc3884cd586f4473295a from Checkpoint 18 @ 1614877356663 for 
8ea8bb860bdefc3884cd586f4473295a located at 
s3a://zalando-stellar-flink-state-eu-central-1-staging/checkpoints/8ea8bb860bdefc3884cd586f4473295a/chk-18.
 2021-03-04 17:04:44,964 INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - No master 
state to restore 2021-03-04 17:04:44,965 INFO  
org.apache.flink.runtime.jobmaster.JobMaster                 [] - Using 
failover strategy 
org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy@530feb4d
 for BrandCollectionTrackingJob (8ea8bb860bdefc3884cd586f4473295a). 2021-03-04 
17:04:44,970 INFO  org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl      
[] - JobManager runner for job BrandCollectionTrackingJob 
(8ea8bb860bdefc3884cd586f4473295a) was granted leadership with session id 
ecb717f4-089f-48af-8d82-63333f7d4b17 at 
akka.tcp://flink@stellar-flink-jobmanager:6123/user/rpc/jobmanager_4. 
2021-03-04 17:05:09,618 WARN  
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Exec 
Failure java.net.SocketTimeoutException: sent ping but didn't receive pong 
within 30000ms (after 1 successful ping/pongs) 2021-03-04 17:05:14,990 ERROR 
org.apache.flink.runtime.rest.handler.job.JobsOverviewHandler [] - Unhandled 
exception. org.apache.flink.runtime.rpc.exceptions.FencingTokenException: 
Fencing token mismatch: Ignoring message 
LocalFencedMessage(9c31a87cf2ff475d049819f3fb9e4cd7, 
LocalRpcInvocation(requestMultipleJobDetails(Time))) because the fencing token 
9c31a87cf2ff475d049819f3fb9e4cd7 did not match the expected fencing token 
bbc60d6ee1cc9717561f755149454d94.}}

> FencingTokenException: Fencing token mismatch
> ---------------------------------------------
>
>                 Key: FLINK-21472
>                 URL: https://issues.apache.org/jira/browse/FLINK-21472
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / Kubernetes
>    Affects Versions: 1.12.1
>            Reporter: hayden zhou
>            Priority: Major
>         Attachments: 
> flink--standalonesession-0-mta-flink-jobmanager-864d6c8cbb-rmsxw.log
>
>
> org.apache.flink.runtime.rest.handler.job.JobsOverviewHandler [] - Unhandled 
> exception.
> org.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fencing token 
> mismatch: Ignoring message 
> LocalFencedMessage(8fac01d8e3e3988223a2e5c6e3f04f1e, 
> LocalRpcInvocation(requestMultipleJobDetails(Time))) because the fencing 
> token 8fac01d8e3e3988223a2e5c6e3f04f1e did not match the expected fencing 
> token 8c37414f464bca76144e6cabc946474b.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to