[ 
https://issues.apache.org/jira/browse/FLINK-6006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15923874#comment-15923874
 ] 

ASF GitHub Bot commented on FLINK-6006:
---------------------------------------

Github user tzulitai commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3505#discussion_r105860440
  
    --- Diff: 
flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaConsumerBase.java
 ---
    @@ -489,16 +486,15 @@ public void notifyCheckpointComplete(long 
checkpointId) throws Exception {
        //  Utilities
        // 
------------------------------------------------------------------------
     
    -   private void assignTopicPartitions(List<KafkaTopicPartition> 
kafkaTopicPartitions) {
    -           subscribedPartitions = new ArrayList<>();
    -
    +   private void assignTopicPartitions() {
                if (restoreToOffset != null) {
    -                   for (KafkaTopicPartition kafkaTopicPartition : 
kafkaTopicPartitions) {
    -                           if 
(restoreToOffset.containsKey(kafkaTopicPartition)) {
    -                                   
subscribedPartitions.add(kafkaTopicPartition);
    -                           }
    +                   subscribedPartitions = new 
ArrayList<>(restoreToOffset.size());
    +                   for (Map.Entry<KafkaTopicPartition, Long> 
restoredPartitionState : restoreToOffset.entrySet()) {
    +                           
subscribedPartitions.add(restoredPartitionState.getKey());
    --- End diff --
    
    (note about your comment --->) subscribed partitions will always be 
completely identical to the restored state, if there is any.
    
    I should actually just change this to `subscribedPartitions = new 
ArrayList<>(restoreToOffset.keySet);`, to be more clearer.


> Kafka Consumer can lose state if queried partition list is incomplete on 
> restore
> --------------------------------------------------------------------------------
>
>                 Key: FLINK-6006
>                 URL: https://issues.apache.org/jira/browse/FLINK-6006
>             Project: Flink
>          Issue Type: Bug
>          Components: Kafka Connector, Streaming Connectors
>            Reporter: Tzu-Li (Gordon) Tai
>            Assignee: Tzu-Li (Gordon) Tai
>            Priority: Blocker
>             Fix For: 1.1.5, 1.2.1
>
>
> In 1.1.x and 1.2.x, the FlinkKafkaConsumer performs partition list querying 
> on restore. Then, only restored state of partitions that exists in the 
> queried list is used to initialize the fetcher's state holders.
> If in any case the returned partition list is incomplete (i.e. missing 
> partitions that existed before, perhaps due to temporary ZK / broker 
> downtime), then the state of the missing partitions is dropped and cannot be 
> recovered anymore.
> In 1.3-SNAPSHOT, this is fixed by changes in FLINK-4280, so only 1.1 and 1.2 
> is affected.
> We can backport some of the behavioural changes there to 1.1 and 1.2. 
> Generally, we should not depend on the current partition list in Kafka when 
> restoring, but just restore all previous state into the fetcher's state 
> holders. 
> This would therefore also require some checking on how the consumer threads / 
> Kafka clients behave when its assigned partitions cannot be reached.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to