[jira] [Commented] (KAFKA-1647) Replication offset checkpoints (high water marks) can be lost on hard kills and restarts

Joel Koshy (JIRA) Thu, 25 Sep 2014 22:35:22 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14148802#comment-14148802
 ]


Joel Koshy commented on KAFKA-1647:
-----------------------------------

Yes that is one possibility but not the only one. For example, suppose this is 
a topic with replication factor three. This is typically when bringing up a 
cluster that was previously hard-killed. Suppose b1 and b2 are brought up 
simultaneously and lose their HW as described above. Suppose the controller 
then elects b1 as the leader. b2 then becomes the follower (successfully) but 
as part of that transition will truncate to zero. I think there should be more 
scenarios since I also saw this with a topic with replication factor of two but 
have not checked the logs yet to see if it was due to a subsequent bounce or 
something else.

> Replication offset checkpoints (high water marks) can be lost on hard kills 
> and restarts
> ----------------------------------------------------------------------------------------
>
>                 Key: KAFKA-1647
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1647
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Joel Koshy
>            Priority: Critical
>              Labels: newbie++
>
> We ran into this scenario recently in a production environment. This can 
> happen when enough brokers in a cluster are taken down. i.e., a rolling 
> bounce done properly should not cause this issue. It can occur if all 
> replicas for any partition are taken down.
> Here is a sample scenario:
> * Cluster of three brokers: b0, b1, b2
> * Two partitions (of some topic) with replication factor two: p0, p1
> * Initial state:
> ** p0: leader = b0, ISR = {b0, b1}
> ** p1: leader = b1, ISR = {b0, b1}
> * Do a parallel hard-kill of all brokers
> * Bring up b2, so it is the new controller
> * b2 initializes its controller context and populates its leader/ISR cache 
> (i.e., controllerContext.partitionLeadershipInfo) from zookeeper. The last 
> known leaders are b0 (for p0) and b1 (for p2)
> * Bring up b1
> * The controller's onBrokerStartup procedure initiates a replica state change 
> for all replicas on b1 to become online. As part of this replica state change 
> it gets the last known leader and ISR and sends a LeaderAndIsrRequest to b1 
> (for p1 and p2). This LeaderAndIsr request contains: {{p0: leader=b0; p1: 
> leader=b1;} leaders=b1}. b0 is indicated as the leader of p0 but it is not 
> included in the leaders field because b0 is down.
> * On receiving the LeaderAndIsrRequest, b1's replica manager will 
> successfully make b2 the leader for p1 (and create the local replica object 
> corresponding to p1). It will however abort the become follower transition 
> for p0 because the designated leader b2 is offline. So it will not create the 
> local replica object for p0.
> * It will then start the high water mark checkpoint thread. Since only p1 has 
> a local replica object, only p1's high water mark will be checkpointed to 
> disk. p0's previously written checkpoint  if any will be lost.
> So in summary it seems we should always create the local replica object even 
> if the online transition does not happen.
> Possible symptoms of the above bug could be one or more of the following (we 
> saw 2 and 3):
> # Data loss; yes on a hard-kill data loss is expected, but this can actually 
> cause loss of nearly all data if the broker becomes follower, truncates, and 
> soon after happens to become leader.
> # High IO on brokers that lose their high water mark then subsequently (on a 
> successful become follower transition) truncate their log to zero and start 
> catching up from the beginning.
> # If the offsets topic is affected, then offsets can get reset. This is 
> because during an offset load we don't read past the high water mark. So if a 
> water mark is missing then we don't load anything (even if the offsets are 
> there in the log).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1647) Replication offset checkpoints (high water marks) can be lost on hard kills and restarts

Reply via email to