[jira] [Commented] (KAFKA-2477) Replicas spuriously deleting all segments in partition

Jun Rao (JIRA) Fri, 02 Oct 2015 17:36:54 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941999#comment-14941999
 ]


Jun Rao commented on KAFKA-2477:
--------------------------------

[~hakon], thanks a lot for the update. This seems like a real issue. Also, the 
point that you made on  "Log.read() can potentially read nextOffsetMetadata 
multiple times" is also relevant. In Log.read(), we have the following code:

    // check if the offset is valid and in range
    val next = nextOffsetMetadata.messageOffset
    if(startOffset == next)
      return FetchDataInfo(nextOffsetMetadata, MessageSet.Empty)

This seems wrong. If nextOffsetMetadata changes after the if test, we could 
return a larger fetchOffsetMetadata in FetchDataInfo that we should. This will 
potentially affect the computation of things like isr. In this case, we should 
get a reference of nextOffsetMetadata first and use that to do the if test and 
as the return value.

Log.read() also references nextOffsetMetadata again in the last line. I am not 
sure if the comment is correct. The last message will never be deleted, so it 
seems that we can never reach the last statement.

    // okay we are beyond the end of the last segment with no data fetched 
although the start offset is in range,
    // this can happen when all messages with offset larger than start offsets 
have been deleted.
    // In this case, we will return the empty set with log end offset metadata
    FetchDataInfo(nextOffsetMetadata, MessageSet.Empty)

[~becket_qin], do you want to fix nextOffsetMetadata in your patch too?

> Replicas spuriously deleting all segments in partition
> ------------------------------------------------------
>
>                 Key: KAFKA-2477
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2477
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.2.1
>            Reporter: Håkon Hitland
>            Assignee: Jiangjie Qin
>             Fix For: 0.9.0.0
>
>         Attachments: kafka_log.txt, kafka_log_trace.txt
>
>
> We're seeing some strange behaviour in brokers: a replica will sometimes 
> schedule all segments in a partition for deletion, and then immediately start 
> replicating them back, triggering our check for under-replicating topics.
> This happens on average a couple of times a week, for different brokers and 
> topics.
> We have per-topic retention.ms and retention.bytes configuration, the topics 
> where we've seen this happen are hitting the size limit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-2477) Replicas spuriously deleting all segments in partition

Reply via email to