[ https://issues.apache.org/jira/browse/KAFKA-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091917#comment-17091917 ]
Igor Baltiyskiy commented on KAFKA-9912: ---------------------------------------- I couldn't reproduce it, and now is blocked by KAFKA-9917. However, while I was at it, I saw that the timestamps were fine. I didn't change the configuration, so I doubt that it should be sufficiently different from the time when it reproduced. Could it be that the broker's clock somehow got weirdly skewed during startup? > Kafka deletes full log on startup written 30mins ago - wrongly claims > retention time 7d breach > ---------------------------------------------------------------------------------------------- > > Key: KAFKA-9912 > URL: https://issues.apache.org/jira/browse/KAFKA-9912 > Project: Kafka > Issue Type: Bug > Components: log cleaner > Affects Versions: 2.1.1 > Environment: Image > solsson/kafka:2.1.1@sha256:8bc8242c649c395ab79d76cc83b1052e63b4efea7f83547bf11eb3ef5ea6f8e1 > deployed in Kubernetes v 1.12.7 (minikube v1.4.0) > 1 broker > Reporter: Igor Baltiyskiy > Priority: Major > Attachments: kafka-logs.txt > > > This isn't easily reproduced, so I'll try to explain how I got there as > accurately as possible. I've been able to observe this 2 times now. > I was experimenting with the resilience of my Kafka-based app by shutting > down Kafka via {{kubectl scale --replicas 0}} — as far as I understand, it > sends {{SIGTERM}} to PID 1 of this container, which is {{java ... > kafka.Kafka}}. I've got it into state where {{kafka-console-consumer.sh}} > cannot read anything from some topics, but can from some others. > So, this works: > {code:java} > bin/kafka-console-consumer.sh --bootstrap-server kafka:9092 --topic > increasing-topic-0 --partition 0 --from-beginning --max-messages 5 > 1 > 2 > 3 > 4 > 5 > Processed a total of 5 messages {code} > And this command just hangs: > {code:java} > bin/kafka-console-consumer.sh --bootstrap-server kafka:9092 --topic > increasing-topic-7 --partition 0 --from-beginning --max-messages 5 {code} > Even though I know there should be plenty of messages in the topic, produced > not more than an hour ago: > {code:java} > root@kafka-client-solsson:/opt/kafka# bin/kafka-run-class.sh > kafka.tools.GetOffsetShell --broker-list kafka:9092 --topic increasing-topic-0 > increasing-topic-0:0:500000 > root@kafka-client-solsson:/opt/kafka# bin/kafka-run-class.sh > kafka.tools.GetOffsetShell --broker-list kafka:9092 --topic increasing-topic-7 > increasing-topic-7:0:321455 {code} > What I see in the broker logs: > {code:java} > [2020-04-23 19:20:10,706] INFO [Log partition=increasing-topic-7-0, > dir=/var/lib/kafka/data/topics] Found deletable segments with base offsets > [0] due to retention time 604800000ms breach (kafka.log.Log) > [2020-04-23 19:20:10,712] INFO [Log partition=increasing-topic-7-0, > dir=/var/lib/kafka/data/topics] Rolled new log segment at offset 321455 in 5 > ms. (kafka.log.Log) > [2020-04-23 19:20:10,712] INFO [Log partition=increasing-topic-7-0, > dir=/var/lib/kafka/data/topics] Scheduling log segment [baseOffset 0, size > 4513814] for deletion. (kafka.log.Log) > [2020-04-23 19:20:10,712] INFO [Log partition=increasing-topic-7-0, > dir=/var/lib/kafka/data/topics] Incrementing log start offset to 321455 > (kafka.log.Log) > [2020-04-23 19:21:10,712] INFO [Log partition=increasing-topic-7-0, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log) > [2020-04-23 19:21:10,713] INFO Deleted log > /var/lib/kafka/data/topics/increasing-topic-7-0/00000000000000000000.log.deleted. > (kafka.log.LogSegment) > [2020-04-23 19:21:10,713] INFO Deleted offset index > /var/lib/kafka/data/topics/increasing-topic-7-0/00000000000000000000.index.deleted. > (kafka.log.LogSegment) > [2020-04-23 19:21:10,713] INFO Deleted time index > /var/lib/kafka/data/topics/increasing-topic-7-0/00000000000000000000.timeindex.deleted. > (kafka.log.LogSegment){code} > As you see, Kafka deletes up to 321455 messages — exactly how many there are > in this topic. > I wouldn't expect Kafka to do that. > Note that in my test, I ran a script that would switch Kafka on and then > after a random period of time off. Maybe that was what caused this bug to > happen. > For the record, that's how much timed passed between on/off switches: > {code:java} > sleeping 29575 ms before switching Kafka off > statefulset.apps/kafka-a scaled > Kafka switched off > sleeping 257347 ms before switching Kafka on > statefulset.apps/kafka-a scaled > kafka switched on > sleeping 6388 ms before switching Kafka off > statefulset.apps/kafka-a scaled > Kafka switched off > sleeping 214066 ms before switching Kafka on > statefulset.apps/kafka-a scaled > kafka switched on > sleeping 1095 ms before switching Kafka off > statefulset.apps/kafka-a scaled > Kafka switched off > sleeping 155721 ms before switching Kafka on > statefulset.apps/kafka-a scaled > kafka switched on > sleeping 8689 ms before switching Kafka off > statefulset.apps/kafka-a scaled > Kafka switched off > sleeping 188854 ms before switching Kafka on > statefulset.apps/kafka-a scaled > kafka switched on > sleeping 39113 ms before switching Kafka off{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)