[ 
https://issues.apache.org/jira/browse/FLINK-6284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16007843#comment-16007843
 ] 

ASF GitHub Bot commented on FLINK-6284:
---------------------------------------

Github user tillrohrmann commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3881#discussion_r116185935
  
    --- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/zookeeper/ZooKeeperStateHandleStore.java
 ---
    @@ -346,11 +346,7 @@ public int exists(String pathInZooKeeper) throws 
Exception {
                        } else {
                                // Initial cVersion (number of changes to the 
children of this node)
                                int initialCVersion = stat.getCversion();
    -
    -                           List<String> children = 
ZKPaths.getSortedChildren(
    -                                           
client.getZookeeperClient().getZooKeeper(),
    -                                           
ZKPaths.fixForNamespace(client.getNamespace(), "/"));
    -
    +                           List<String> children = 
client.getZookeeperClient().getZooKeeper().getChildren(ZKPaths.fixForNamespace(client.getNamespace(),
 "/"), false);
    --- End diff --
    
    I think this alone does not work: The JavaDocs of `ZooKeeper#getChildren` 
say
    
    > The list of children returned is not sorted and no guarantee is provided 
as to its natural or lexical order.
    
    Thus, I assume that it is not safe to simply return the list of children 
without any further processing.


> Incorrect sorting of completed checkpoints in 
> ZooKeeperCompletedCheckpointStore
> -------------------------------------------------------------------------------
>
>                 Key: FLINK-6284
>                 URL: https://issues.apache.org/jira/browse/FLINK-6284
>             Project: Flink
>          Issue Type: Bug
>          Components: State Backends, Checkpointing
>            Reporter: Xiaogang Shi
>            Priority: Blocker
>             Fix For: 1.3.0
>
>
> Now all completed checkpoints are sorted in their paths when they are 
> recovered in {{ZooKeeperCompletedCheckpointStore}} . In the cases where the 
> latest checkpoint's id is not the largest in lexical order (e.g., "100" is 
> smaller than "99" in lexical order), Flink will not recover from the latest 
> completed checkpoint.
> The problem can be easily observed by setting the checkpoint ids in 
> {{ZooKeeperCompletedCheckpointStoreITCase#testRecover()}} to be 99, 100 and 
> 101. 
> To fix the problem, we should explicitly sort found checkpoints in their 
> checkpoint ids, without the usage of 
> {{ZooKeeperStateHandleStore#getAllSortedByName()}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to