[ 
https://issues.apache.org/jira/browse/FLINK-6284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008015#comment-16008015
 ] 

ASF GitHub Bot commented on FLINK-6284:
---------------------------------------

Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3881#discussion_r116211254
  
    --- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/zookeeper/ZooKeeperStateHandleStore.java
 ---
    @@ -346,17 +346,20 @@ public int exists(String pathInZooKeeper) throws 
Exception {
                        } else {
                                // Initial cVersion (number of changes to the 
children of this node)
                                int initialCVersion = stat.getCversion();
    -
    -                           List<String> children = 
ZKPaths.getSortedChildren(
    -                                           
client.getZookeeperClient().getZooKeeper(),
    -                                           
ZKPaths.fixForNamespace(client.getNamespace(), "/"));
    -
    -                           for (String path : children) {
    -                                   path = "/" + path;
    +                           List<String> childrenInStr =
    +                                   
client.getZookeeperClient().getZooKeeper().
    +                                           
getChildren(ZKPaths.fixForNamespace(client.getNamespace(), "/"), false);
    +                           List<Long> children = new 
ArrayList<Long>(childrenInStr.size());
    +                           for(String childNode : childrenInStr) {
    +                                   children.add(new Long(childNode));
    --- End diff --
    
    Ok. I see. I am not sure on this MesosWorker. Using cxid am not sure if we 
have an API. If so we can direclty use it. Will be back.


> Incorrect sorting of completed checkpoints in 
> ZooKeeperCompletedCheckpointStore
> -------------------------------------------------------------------------------
>
>                 Key: FLINK-6284
>                 URL: https://issues.apache.org/jira/browse/FLINK-6284
>             Project: Flink
>          Issue Type: Bug
>          Components: State Backends, Checkpointing
>            Reporter: Xiaogang Shi
>            Priority: Blocker
>             Fix For: 1.3.0
>
>
> Now all completed checkpoints are sorted in their paths when they are 
> recovered in {{ZooKeeperCompletedCheckpointStore}} . In the cases where the 
> latest checkpoint's id is not the largest in lexical order (e.g., "100" is 
> smaller than "99" in lexical order), Flink will not recover from the latest 
> completed checkpoint.
> The problem can be easily observed by setting the checkpoint ids in 
> {{ZooKeeperCompletedCheckpointStoreITCase#testRecover()}} to be 99, 100 and 
> 101. 
> To fix the problem, we should explicitly sort found checkpoints in their 
> checkpoint ids, without the usage of 
> {{ZooKeeperStateHandleStore#getAllSortedByName()}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to