[jira] [Commented] (KAFKA-13077) Replication failing after unclean shutdown of ZK and all brokers

Shivakumar (Jira) Thu, 16 Dec 2021 23:23:23 -0800


    [ 
https://issues.apache.org/jira/browse/KAFKA-13077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461239#comment-17461239
 ]


Shivakumar commented on KAFKA-13077:
------------------------------------

hi [~junrao]  we did not save the DumpLogSegment log during the incident
but we were able to reproduce the error and got this output, it should be the 
same in the case of our above error
kafka [ /var/lib/kafka/data/__consumer_offsets-46 ]$ ls -l
total 32
-rw-rw-r-- 1 kafka kafka        0 Dec  9 11:40 00000000000000000000.index
-rw-rw-r-- 1 kafka kafka      877 Dec  9 11:40 00000000000000000000.log
-rw-rw-r-- 1 kafka kafka       12 Dec  9 11:40 00000000000000000000.timeindex
-rw-rw-r-- 1 kafka kafka 10485760 Dec 13 13:20 00000000000004804743.index
-rw-rw-r-- 1 kafka kafka      207 Dec 11 21:01 00000000000004804743.log
-rw-rw-r-- 1 kafka kafka       10 Dec 11 21:01 00000000000004804743.snapshot
-rw-rw-r-- 1 kafka kafka 10485756 Dec 13 13:20 00000000000004804743.timeindex
-rw-rw-r-- 1 kafka kafka       10 Dec 13 13:11 00000000000004804745.snapshot
-rw-r--r-- 1 kafka kafka      132 Dec 13 13:20 leader-epoch-checkpoint
-rw-rw-r-- 1 kafka kafka       43 Dec  1 11:13 partition.metadata
kafka [ /var/lib/kafka/data/__consumer_offsets-46 ]$ kafka-run-class.sh 
kafka.tools.DumpLogSegments --files 00000000000004804743.index
Dumping 00000000000004804743.index
offset: 4804743 position: 0
Mismatches in 
:/var/lib/kafka/data/__consumer_offsets-46/00000000000004804743.index
  Index offset: 4804743, log offset: 4804744
kafka [ /var/lib/kafka/data/__consumer_offsets-46 ]$ kafka-run-class.sh 
kafka.tools.DumpLogSegments --files 00000000000004804743.log
Dumping 00000000000004804743.log
Starting offset: 4804743
baseOffset: 4804743 lastOffset: 4804744 count: 2 baseSequence: -1 lastSequence: 
-1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 368 isTransactional: 
false isControl: false position: 0 CreateTime: 1639256468568 size: 207 magic: 2 
compresscodec: NONE crc: 2267717758 isvalid: true
kafka [ /var/lib/kafka/data/__consumer_offsets-46 ]$ kafka-run-class.sh 
kafka.tools.DumpLogSegments --files 00000000000004804743.timeindex
Dumping 00000000000004804743.timeindex
timestamp: 1639256468568 offset: 4804744
kafka [ /var/lib/kafka/data/__consumer_offsets-46 ]$ kafka-run-class.sh 
kafka.tools.DumpLogSegments --files 00000000000004804745.snapshot
Dumping 00000000000004804745.snapshot
kafka [ /var/lib/kafka/data/__consumer_offsets-46 ]$

> Replication failing after unclean shutdown of ZK and all brokers
> ----------------------------------------------------------------
>
>                 Key: KAFKA-13077
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13077
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 2.8.0
>            Reporter: Christopher Auston
>            Priority: Minor
>
> I am submitting this in the spirit of what can go wrong when an operator 
> violates the constraints Kafka depends on. I don't know if Kafka could or 
> should handle this more gracefully. I decided to file this issue because it 
> was easy to get the problem I'm reporting with Kubernetes StatefulSets (STS). 
> By "easy" I mean that I did not go out of my way to corrupt anything, I just 
> was not careful when restarting ZK and brokers.
> I violated the constraints of keeping Zookeeper stable and at least one 
> running in-sync replica. 
> I am running the bitnami/kafka helm chart on Amazon EKS.
> {quote}% kubectl get po kaf-kafka-0 -ojson |jq .spec.containers'[].image'
> "docker.io/bitnami/kafka:2.8.0-debian-10-r43"
> {quote}
> I started with 3 ZK instances and 3 brokers (both STS). I changed the 
> cpu/memory requests on both STS and kubernetes proceeded to restart ZK and 
> kafka instances at the same time. If I recall correctly there were some 
> crashes and several restarts but eventually all the instances were running 
> again. It's possible all ZK nodes and all brokers were unavailable at various 
> points.
> The problem I noticed was that two of the brokers were just continually 
> spitting out messages like:
> {quote}% kubectl logs kaf-kafka-0 --tail 10
> [2021-07-13 14:26:08,871] INFO [ProducerStateManager 
> partition=__transaction_state-0] Loading producer state from snapshot file 
> 'SnapshotFile(/bitnami/kafka/data/__transaction_state-0/00000000000000000001.snapshot,1)'
>  (kafka.log.ProducerStateManager)
> [2021-07-13 14:26:08,871] WARN [Log partition=__transaction_state-0, 
> dir=/bitnami/kafka/data] *Non-monotonic update of high watermark from 
> (offset=2744 segment=[0:1048644]) to (offset=1 segment=[0:169])* 
> (kafka.log.Log)
> [2021-07-13 14:26:08,874] INFO [Log partition=__transaction_state-10, 
> dir=/bitnami/kafka/data] Truncating to offset 2 (kafka.log.Log)
> [2021-07-13 14:26:08,877] INFO [Log partition=__transaction_state-10, 
> dir=/bitnami/kafka/data] Loading producer state till offset 2 with message 
> format version 2 (kafka.log.Log)
> [2021-07-13 14:26:08,877] INFO [ProducerStateManager 
> partition=__transaction_state-10] Loading producer state from snapshot file 
> 'SnapshotFile(/bitnami/kafka/data/__transaction_state-10/00000000000000000002.snapshot,2)'
>  (kafka.log.ProducerStateManager)
> [2021-07-13 14:26:08,877] WARN [Log partition=__transaction_state-10, 
> dir=/bitnami/kafka/data] Non-monotonic update of high watermark from 
> (offset=2930 segment=[0:1048717]) to (offset=2 segment=[0:338]) 
> (kafka.log.Log)
> [2021-07-13 14:26:08,880] INFO [Log partition=__transaction_state-20, 
> dir=/bitnami/kafka/data] Truncating to offset 1 (kafka.log.Log)
> [2021-07-13 14:26:08,882] INFO [Log partition=__transaction_state-20, 
> dir=/bitnami/kafka/data] Loading producer state till offset 1 with message 
> format version 2 (kafka.log.Log)
> [2021-07-13 14:26:08,882] INFO [ProducerStateManager 
> partition=__transaction_state-20] Loading producer state from snapshot file 
> 'SnapshotFile(/bitnami/kafka/data/__transaction_state-20/00000000000000000001.snapshot,1)'
>  (kafka.log.ProducerStateManager)
> [2021-07-13 14:26:08,883] WARN [Log partition=__transaction_state-20, 
> dir=/bitnami/kafka/data] Non-monotonic update of high watermark from 
> (offset=2956 segment=[0:1048608]) to (offset=1 segment=[0:169]) 
> (kafka.log.Log)
> {quote}
> If I describe that topic I can see that several partitions have a leader of 2 
> and the ISR is just 2 (NOTE I added two more brokers and tried to reassign 
> the topic onto brokers 2,3,4 which you can see below). The new brokers also 
> spit out the messages about "non-monotonic update" just like the original 
> followers. This describe output is from the following day.
> {{% kafka-topics.sh ${=BS} -topic __transaction_state -describe}}
> {{Topic: __transaction_state TopicId: i7bBNCeuQMWl-ZMpzrnMAw PartitionCount: 
> 50 ReplicationFactor: 3 Configs: 
> compression.type=uncompressed,min.insync.replicas=3,cleanup.policy=compact,flush.ms=1000,segment.bytes=104857600,flush.messages=10000,max.message.bytes=1000012,unclean.leader.election.enable=false,retention.bytes=1073741824}}
> {{ Topic: __transaction_state Partition: 0 Leader: 2 Replicas: 4,3,2,1,0 Isr: 
> 2 Adding Replicas: 4,3 Removing Replicas: 1,0}}
> {{ Topic: __transaction_state Partition: 1 Leader: 2 Replicas: 2,4,3 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 2 Leader: 3 Replicas: 3,2,4 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 3 Leader: 4 Replicas: 4,2,3 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 4 Leader: 2 Replicas: 2,3,4 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 5 Leader: 2 Replicas: 3,4,2 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 6 Leader: 4 Replicas: 4,3,2 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 7 Leader: 2 Replicas: 2,4,3 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 8 Leader: 2 Replicas: 3,2,4,0,1 Isr: 
> 2 Adding Replicas: 3,4 Removing Replicas: 0,1}}
> {{ Topic: __transaction_state Partition: 9 Leader: 2 Replicas: 4,2,3,1,0 Isr: 
> 2 Adding Replicas: 4,3 Removing Replicas: 1,0}}
> {{ Topic: __transaction_state Partition: 10 Leader: 2 Replicas: 2,3,4,1,0 
> Isr: 2 Adding Replicas: 3,4 Removing Replicas: 1,0}}
> {{ Topic: __transaction_state Partition: 11 Leader: 3 Replicas: 3,4,2 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 12 Leader: 4 Replicas: 4,3,2 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 13 Leader: 2 Replicas: 2,4,3 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 14 Leader: 3 Replicas: 3,2,4 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 15 Leader: 4 Replicas: 4,2,3 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 16 Leader: 2 Replicas: 2,3,4 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 17 Leader: 2 Replicas: 3,4,2 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 18 Leader: 4 Replicas: 4,3,2 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 19 Leader: 2 Replicas: 2,4,3,0,1 
> Isr: 2 Adding Replicas: 4,3 Removing Replicas: 0,1}}
> {{ Topic: __transaction_state Partition: 20 Leader: 2 Replicas: 3,2,4,0,1 
> Isr: 2 Adding Replicas: 3,4 Removing Replicas: 0,1}}
> {{ Topic: __transaction_state Partition: 21 Leader: 2 Replicas: 4,2,3,1,0 
> Isr: 2 Adding Replicas: 4,3 Removing Replicas: 1,0}}
> {{ Topic: __transaction_state Partition: 22 Leader: 2 Replicas: 2,3,4,1,0 
> Isr: 2 Adding Replicas: 3,4 Removing Replicas: 1,0}}
> {{ Topic: __transaction_state Partition: 23 Leader: 3 Replicas: 3,4,2 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 24 Leader: 4 Replicas: 4,3,2 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 25 Leader: 2 Replicas: 2,4,3 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 26 Leader: 3 Replicas: 3,2,4 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 27 Leader: 4 Replicas: 4,2,3 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 28 Leader: 2 Replicas: 2,3,4 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 29 Leader: 3 Replicas: 3,4,2 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 30 Leader: 4 Replicas: 4,3,2 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 31 Leader: 2 Replicas: 2,4,3 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 32 Leader: 3 Replicas: 3,2,4 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 33 Leader: 4 Replicas: 4,2,3 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 34 Leader: 2 Replicas: 2,3,4 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 35 Leader: 3 Replicas: 3,4,2 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 36 Leader: 4 Replicas: 4,3,2 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 37 Leader: 2 Replicas: 2,4,3 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 38 Leader: 3 Replicas: 3,2,4 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 39 Leader: 4 Replicas: 4,2,3 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 40 Leader: 2 Replicas: 2,3,4 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 41 Leader: 3 Replicas: 3,4,2 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 42 Leader: 4 Replicas: 4,3,2 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 43 Leader: 2 Replicas: 2,4,3 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 44 Leader: 3 Replicas: 3,2,4 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 45 Leader: 4 Replicas: 4,2,3 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 46 Leader: 2 Replicas: 2,3,4 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 47 Leader: 3 Replicas: 3,4,2 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 48 Leader: 4 Replicas: 4,3,2 Isr: 
> 2,3,4}}
> {{ Topic: __transaction_state Partition: 49 Leader: 2 Replicas: 2,4,3 Isr: 
> 2,3,4}}
>  
> It seems something got corrupted and the followers will never make progress. 
> Even worse the original followers appear to have truncated their copies, so 
> if the remaining leader replica is what is corrupted then it may have 
> truncated replicas that had more valid data?
> Anyway, for what it's worth, this is something that happened to me. I plan to 
> change the statefulsets to require manual restarts so I can control rolling 
> upgrades. It also seems to underscore having a separate Kafka cluster for 
> disaster recovery.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (KAFKA-13077) Replication failing after unclean shutdown of ZK and all brokers

Reply via email to