Igor Baltiyskiy created KAFKA-9912:
--------------------------------------
Summary: Kafka deletes full log on startup written 30mins ago -
wrongly claims retention time 7d breach
Key: KAFKA-9912
URL: https://issues.apache.org/jira/browse/KAFKA-9912
Project: Kafka
Issue Type: Bug
Components: log cleaner
Affects Versions: 2.1.1
Environment: Image
solsson/kafka:2.1.1@sha256:8bc8242c649c395ab79d76cc83b1052e63b4efea7f83547bf11eb3ef5ea6f8e1
deployed in Kubernetes v 1.12.7 (minikube v1.4.0)
1 broker
Reporter: Igor Baltiyskiy
Attachments: kafka-logs.txt
This isn't easily reproduced, so bear with me.
I was experimenting with the resilience of my Kafka-based app by shutting down
Kafka via {{kubectl scale --replicas 0}} — as far as I understand, it sends
{{SIGTERM}} to PID 1 of this container, which is {{java ... kafka.Kafka}}. I've
got it into state where {{kafka-console-consumer.sh}} cannot read anything from
some topics, but can from some others.
So, this works:
{code:java}
bin/kafka-console-consumer.sh --bootstrap-server kafka:9092 --topic
increasing-topic-0 --partition 0 --from-beginning --max-messages 5
1
2
3
4
5
Processed a total of 5 messages {code}
And this command just hangs:
{code:java}
bin/kafka-console-consumer.sh --bootstrap-server kafka:9092 --topic
increasing-topic-7 --partition 0 --from-beginning --max-messages 5 {code}
Even though I know there should be plenty of messages in the topic, produced
not more than an hour ago:
{code:java}
root@kafka-client-solsson:/opt/kafka# bin/kafka-run-class.sh
kafka.tools.GetOffsetShell --broker-list kafka:9092 --topic increasing-topic-0
increasing-topic-0:0:500000
root@kafka-client-solsson:/opt/kafka# bin/kafka-run-class.sh
kafka.tools.GetOffsetShell --broker-list kafka:9092 --topic increasing-topic-7
increasing-topic-7:0:321455 {code}
What I see in the broker logs:
{code:java}
[2020-04-23 19:20:10,706] INFO [Log partition=increasing-topic-7-0,
dir=/var/lib/kafka/data/topics] Found deletable segments with base offsets [0]
due to retention time 604800000ms breach (kafka.log.Log)
[2020-04-23 19:20:10,712] INFO [Log partition=increasing-topic-7-0,
dir=/var/lib/kafka/data/topics] Rolled new log segment at offset 321455 in 5
ms. (kafka.log.Log)
[2020-04-23 19:20:10,712] INFO [Log partition=increasing-topic-7-0,
dir=/var/lib/kafka/data/topics] Scheduling log segment [baseOffset 0, size
4513814] for deletion. (kafka.log.Log)
[2020-04-23 19:20:10,712] INFO [Log partition=increasing-topic-7-0,
dir=/var/lib/kafka/data/topics] Incrementing log start offset to 321455
(kafka.log.Log)
[2020-04-23 19:21:10,712] INFO [Log partition=increasing-topic-7-0,
dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)
[2020-04-23 19:21:10,713] INFO Deleted log
/var/lib/kafka/data/topics/increasing-topic-7-0/00000000000000000000.log.deleted.
(kafka.log.LogSegment)
[2020-04-23 19:21:10,713] INFO Deleted offset index
/var/lib/kafka/data/topics/increasing-topic-7-0/00000000000000000000.index.deleted.
(kafka.log.LogSegment)
[2020-04-23 19:21:10,713] INFO Deleted time index
/var/lib/kafka/data/topics/increasing-topic-7-0/00000000000000000000.timeindex.deleted.
(kafka.log.LogSegment){code}
As you see, Kafka deletes up to 321455 messages — exactly how many there are in
this topic.
I wouldn't expect Kafka to do that.
Note that in my test, I ran a script that would switch Kafka on and then after
a random period of time off. Maybe that was what caused this bug to happen.
For the record, that's how much timed passed between on/off switches:
{code:java}
sleeping 29575 ms before switching Kafka off
statefulset.apps/kafka-a scaled
Kafka switched off
sleeping 257347 ms before switching Kafka on
statefulset.apps/kafka-a scaled
kafka switched on
sleeping 6388 ms before switching Kafka off
statefulset.apps/kafka-a scaled
Kafka switched off
sleeping 214066 ms before switching Kafka on
statefulset.apps/kafka-a scaled
kafka switched on
sleeping 1095 ms before switching Kafka off
statefulset.apps/kafka-a scaled
Kafka switched off
sleeping 155721 ms before switching Kafka on
statefulset.apps/kafka-a scaled
kafka switched on
sleeping 8689 ms before switching Kafka off
statefulset.apps/kafka-a scaled
Kafka switched off
sleeping 188854 ms before switching Kafka on
statefulset.apps/kafka-a scaled
kafka switched on
sleeping 39113 ms before switching Kafka off{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)