Hi, I have posted this question on SO: https://stackoverflow.com/questions/67625641/kafka-segments-are-deleted-too-often-or-not-at-all but wanted to re-post here as well in case someone spots the issue right away ....
Thank you for your help! >>>>> We have two topics on our Kafka cluster that exhibit weird (wrong) behavior related to retention configuration. One topic, **tracking.ap.client.traffic**, has retention set explicitly to "retention.ms=14400000" (4 hrs) - but it is not cleaned up , grows in size, and caused 2 out of 3 kafka brokers to run out of disk space. (details about config and log below) Second topic, **tracking.ap.client.traffic.keyed**, is created in KSQL as a stream topic: ``` CREATE STREAM AP_CLIENT_TRAFFIC_KEYED WITH (KAFKA_TOPIC='tracking.ap.client.traffic.keyed', TIMESTAMP='ACTIVITY_DATE', VALUE_FORMAT='JSON', PARTITIONS=6) AS SELECT ... ``` its retention is set to the default broker value, which is 720 hrs : ``` cat /etc/kafka/server.properties | grep retention log.retention.hours=720 # A size-based retention policy for logs. Segments are pruned from the log unless the remaining # segments drop below log.retention.bytes. Functions independently of log.retention.hours. #log.retention.bytes=1073741824 # to the retention policies log.retention.check.interval.ms=300000 ``` This topic, though, gets cleaned up every 5 min or so - according to the logs.... The log entry says the segment is marked for deletion "due to retention time 2592000000ms breach (kafka.log.Log)" - but how can that be true if this is happening every 5 min?? No size-based retention is set for any of the topics. Two questions: 1. why is the first topic not being cleaned p? 2. why is the second topic being cleaned up so often? Below are the details about logs and full config of both topics: log entries for **tracking.ap.client.traffic.keyed**-2 topic/partition - show that this partition is getting cleaned too often: ``` [2021-05-19 21:35:05,822] INFO [Log partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Incrementing log start offset to 11755700 (kafka.log.Log) [2021-05-19 21:36:05,822] INFO [Log partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Deleting segment 11753910 (kafka.log.Log) [2021-05-19 21:36:05,825] INFO Deleted log /apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011753910.log.deleted. (kafka.log.LogSegment) [2021-05-19 21:36:05,827] INFO Deleted offset index /apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011753910.index.deleted. (kafka.log.LogSegment) [2021-05-19 21:36:05,829] INFO Deleted time index /apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011753910.timeindex.deleted. (kafka.log.LogSegment) [2021-05-19 21:40:05,838] INFO [Log partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Found deletable segments with base offsets [11755700] due to retention time 2592000000ms breach (kafka.log.Log) [2021-05-19 21:40:05,843] INFO [ProducerStateManager partition=tracking.ap.client.traffic.keyed-2] Writing producer snapshot at offset 11757417 (kafka.log.ProducerStateManager) [2021-05-19 21:40:05,845] INFO [Log partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Rolled new log segment at offset 11757417 in 7 ms. (kafka.log.Log) [2021-05-19 21:40:05,845] INFO [Log partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Scheduling log segment [baseOffset 11755700, size 936249] for deletion. (kafka.log.Log) [2021-05-19 21:40:05,847] INFO [Log partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Incrementing log start offset to 11757417 (kafka.log.Log) [2021-05-19 21:41:05,848] INFO [Log partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Deleting segment 11755700 (kafka.log.Log) [2021-05-19 21:41:05,850] INFO Deleted log /apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011755700.log.deleted. (kafka.log.LogSegment) [2021-05-19 21:41:05,852] INFO Deleted offset index /apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011755700.index.deleted. (kafka.log.LogSegment) [2021-05-19 21:41:05,853] INFO Deleted time index /apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011755700.timeindex.deleted. (kafka.log.LogSegment) [2021-05-19 21:45:05,834] INFO [Log partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Found deletable segments with base offsets [11757417] due to retention time 2592000000ms breach (kafka.log.Log) [2021-05-19 21:45:05,837] INFO [ProducerStateManager partition=tracking.ap.client.traffic.keyed-2] Writing producer snapshot at offset 11759201 (kafka.log.ProducerStateManager) [2021-05-19 21:45:05,839] INFO [Log partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Rolled new log segment at offset 11759201 in 6 ms. (kafka.log.Log) [2021-05-19 21:45:05,839] INFO [Log partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Scheduling log segment [baseOffset 11757417, size 977444] for deletion. (kafka.log.Log) [2021-05-19 21:45:05,842] INFO [Log partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Incrementing log start offset to 11759201 (kafka.log.Log) [2021-05-19 21:46:05,842] INFO [Log partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Deleting segment 11757417 (kafka.log.Log) [2021-05-19 21:46:05,844] INFO Deleted log /apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011757417.log.deleted. (kafka.log.LogSegment) [2021-05-19 21:46:05,845] INFO Deleted offset index /apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011757417.index.deleted. (kafka.log.LogSegment) [2021-05-19 21:46:05,847] INFO Deleted time index /apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011757417.timeindex.deleted. (kafka.log.LogSegment) [2021-05-19 21:50:05,832] INFO [Log partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Found deletable segments with base offsets [11759201] due to retention time 2592000000ms breach (kafka.log.Log) [2021-05-19 21:50:05,837] INFO [ProducerStateManager partition=tracking.ap.client.traffic.keyed-2] Writing producer snapshot at offset 11760980 (kafka.log.ProducerStateManager) [2021-05-19 21:50:05,839] INFO [Log partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Rolled new log segment at offset 11760980 in 7 ms. (kafka.log.Log) [2021-05-19 21:50:05,839] INFO [Log partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Scheduling log segment [baseOffset 11759201, size 991447] for deletion. (kafka.log.Log) [2021-05-19 21:50:05,844] INFO [Log partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Incrementing log start offset to 11760980 (kafka.log.Log) [2021-05-19 21:51:05,844] INFO [Log partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Deleting segment 11759201 (kafka.log.Log) [2021-05-19 21:51:05,846] INFO Deleted log /apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011759201.log.deleted. (kafka.log.LogSegment) [2021-05-19 21:51:05,848] INFO Deleted offset index /apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011759201.index.deleted. (kafka.log.LogSegment) [2021-05-19 21:51:05,849] INFO Deleted time index /apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011759201.timeindex.deleted. (kafka.log.LogSegment) [2021-05-19 21:55:05,828] INFO [Log partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Found deletable segments with base offsets [11760980] due to retention time 2592000000ms breach (kafka.log.Log) [2021-05-19 21:55:05,833] INFO [ProducerStateManager partition=tracking.ap.client.traffic.keyed-2] Writing producer snapshot at offset 11762941 (kafka.log.ProducerStateManager) [2021-05-19 21:55:05,835] INFO [Log partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Rolled new log segment at offset 11762941 in 7 ms. (kafka.log.Log) [2021-05-19 21:55:05,835] INFO [Log partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Scheduling log segment [baseOffset 11760980, size 1079204] for deletion. (kafka.log.Log) [2021-05-19 21:55:05,838] INFO [Log partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Incrementing log start offset to 11762941 (kafka.log.Log) [2021-05-19 21:56:05,838] INFO [Log partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Deleting segment 11760980 (kafka.log.Log) [2021-05-19 21:56:05,840] INFO Deleted log /apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011760980.log.deleted. (kafka.log.LogSegment) [2021-05-19 21:56:05,842] INFO Deleted offset index /apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011760980.index.deleted. (kafka.log.LogSegment) [2021-05-19 21:56:05,843] INFO Deleted time index /apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011760980.timeindex.deleted. (kafka.log.LogSegment) ``` There are **no such entries** for log cleaning for the first topic, tracking.ap.client.traffic, which indicates it is never even checked for hitting retention ?? Full config for **tracking.ap.client.traffic.keyed** topic: (that is cleaned up too often): ``` mac-lt-mpopova:confluent-6.1.0 mpopova$ ./bin/kafka-configs --bootstrap-server mybroker1:9092 --entity-type topics --entity-name tracking.ap.client.traffic.keyed --describe --all All configs for topic tracking.ap.client.traffic.keyed are: compression.type=producer sensitive=false synonyms={DEFAULT_CONFIG:compression.type=producer} leader.replication.throttled.replicas= sensitive=false synonyms={} message.downconversion.enable=true sensitive=false synonyms={DEFAULT_CONFIG:log.message.downconversion.enable=true} min.insync.replicas=2 sensitive=false synonyms={STATIC_BROKER_CONFIG:min.insync.replicas=2, DEFAULT_CONFIG:min.insync.replicas=1} segment.jitter.ms=0 sensitive=false synonyms={} cleanup.policy=delete sensitive=false synonyms={DEFAULT_CONFIG:log.cleanup.policy=delete} flush.ms=9223372036854775807 sensitive=false synonyms={} follower.replication.throttled.replicas= sensitive=false synonyms={} segment.bytes=1073741824 sensitive=false synonyms={STATIC_BROKER_CONFIG:log.segment.bytes=1073741824, DEFAULT_CONFIG:log.segment.bytes=1073741824} retention.ms=2592000000 sensitive=false synonyms={} flush.messages=9223372036854775807 sensitive=false synonyms={DEFAULT_CONFIG:log.flush.interval.messages=9223372036854775807} message.format.version=2.1-IV2 sensitive=false synonyms={DEFAULT_CONFIG:log.message.format.version=2.1-IV2} file.delete.delay.ms=60000 sensitive=false synonyms={DEFAULT_CONFIG:log.segment.delete.delay.ms=60000} max.message.bytes=1000012 sensitive=false synonyms={DEFAULT_CONFIG:message.max.bytes=1000012} min.compaction.lag.ms=0 sensitive=false synonyms={DEFAULT_CONFIG:log.cleaner.min.compaction.lag.ms=0} message.timestamp.type=CreateTime sensitive=false synonyms={DEFAULT_CONFIG:log.message.timestamp.type=CreateTime} preallocate=false sensitive=false synonyms={DEFAULT_CONFIG:log.preallocate=false} min.cleanable.dirty.ratio=0.5 sensitive=false synonyms={DEFAULT_CONFIG:log.cleaner.min.cleanable.ratio=0.5} index.interval.bytes=4096 sensitive=false synonyms={DEFAULT_CONFIG:log.index.interval.bytes=4096} unclean.leader.election.enable=false sensitive=false synonyms={STATIC_BROKER_CONFIG:unclean.leader.election.enable=false, DEFAULT_CONFIG:unclean.leader.election.enable=false} retention.bytes=-1 sensitive=false synonyms={DEFAULT_CONFIG:log.retention.bytes=-1} delete.retention.ms=86400000 sensitive=false synonyms={DEFAULT_CONFIG:log.cleaner.delete.retention.ms=86400000} segment.ms=604800000 sensitive=false synonyms={} message.timestamp.difference.max.ms=9223372036854775807 sensitive=false synonyms={DEFAULT_CONFIG:log.message.timestamp.difference.max.ms=9223372036854775807} segment.index.bytes=10485760 sensitive=false synonyms={DEFAULT_CONFIG:log.index.size.max.bytes=10485760} ``` Full config for **tracking.ap.client.traffic** topic: (not being cleaned up at all): ``` mac-lt-mpopova:confluent-6.1.0 mpopova$ ./bin/kafka-configs --bootstrap-server mybroker1:9092 --entity-type topics --entity-name tracking.ap.client.traffic --describe --all All configs for topic tracking.ap.client.traffic are: compression.type=producer sensitive=false synonyms={DEFAULT_CONFIG:compression.type=producer} leader.replication.throttled.replicas= sensitive=false synonyms={} message.downconversion.enable=true sensitive=false synonyms={DEFAULT_CONFIG:log.message.downconversion.enable=true} min.insync.replicas=2 sensitive=false synonyms={STATIC_BROKER_CONFIG:min.insync.replicas=2, DEFAULT_CONFIG:min.insync.replicas=1} segment.jitter.ms=0 sensitive=false synonyms={} cleanup.policy=delete sensitive=false synonyms={DEFAULT_CONFIG:log.cleanup.policy=delete} flush.ms=9223372036854775807 sensitive=false synonyms={} follower.replication.throttled.replicas= sensitive=false synonyms={} segment.bytes=1073741824 sensitive=false synonyms={STATIC_BROKER_CONFIG:log.segment.bytes=1073741824, DEFAULT_CONFIG:log.segment.bytes=1073741824} retention.ms=14400000 sensitive=false synonyms={DYNAMIC_TOPIC_CONFIG:retention.ms=14400000} flush.messages=9223372036854775807 sensitive=false synonyms={DEFAULT_CONFIG:log.flush.interval.messages=9223372036854775807} message.format.version=2.1-IV2 sensitive=false synonyms={DEFAULT_CONFIG:log.message.format.version=2.1-IV2} file.delete.delay.ms=60000 sensitive=false synonyms={DEFAULT_CONFIG:log.segment.delete.delay.ms=60000} max.message.bytes=1000012 sensitive=false synonyms={DEFAULT_CONFIG:message.max.bytes=1000012} min.compaction.lag.ms=0 sensitive=false synonyms={DEFAULT_CONFIG:log.cleaner.min.compaction.lag.ms=0} message.timestamp.type=CreateTime sensitive=false synonyms={DEFAULT_CONFIG:log.message.timestamp.type=CreateTime} preallocate=false sensitive=false synonyms={DEFAULT_CONFIG:log.preallocate=false} min.cleanable.dirty.ratio=0.5 sensitive=false synonyms={DEFAULT_CONFIG:log.cleaner.min.cleanable.ratio=0.5} index.interval.bytes=4096 sensitive=false synonyms={DEFAULT_CONFIG:log.index.interval.bytes=4096} unclean.leader.election.enable=false sensitive=false synonyms={STATIC_BROKER_CONFIG:unclean.leader.election.enable=false, DEFAULT_CONFIG:unclean.leader.election.enable=false} retention.bytes=-1 sensitive=false synonyms={DEFAULT_CONFIG:log.retention.bytes=-1} delete.retention.ms=86400000 sensitive=false synonyms={DEFAULT_CONFIG:log.cleaner.delete.retention.ms=86400000} segment.ms=604800000 sensitive=false synonyms={} message.timestamp.difference.max.ms=9223372036854775807 sensitive=false synonyms={DEFAULT_CONFIG:log.message.timestamp.difference.max.ms=9223372036854775807} segment.index.bytes=10485760 sensitive=false synonyms={DEFAULT_CONFIG:log.index.size.max.bytes=10485760} ``` Here is an example directory content of the logs for both topics: ls -la /apps/kafka-data/**tracking.ap.client.traffic**-* ``` /apps/kafka-data/tracking.ap.client.traffic-1: total 412640 drwxr-xr-x 2 cp-kafka confluent 141 May 17 15:01 . drwxr-xr-x 290 cp-kafka confluent 20480 May 19 22:48 .. -rw-r--r-- 1 cp-kafka confluent 10485760 May 19 22:48 00000000000000000000.index -rw-r--r-- 1 cp-kafka confluent 420599482 May 19 22:48 00000000000000000000.log -rw-r--r-- 1 cp-kafka confluent 10485756 May 19 22:48 00000000000000000000.timeindex -rw-r--r-- 1 cp-kafka confluent 8 May 17 15:01 leader-epoch-checkpoint /apps/kafka-data/tracking.ap.client.traffic-11: total 412844 drwxr-xr-x 2 cp-kafka confluent 141 May 17 11:25 . drwxr-xr-x 290 cp-kafka confluent 20480 May 19 22:48 .. -rw-r--r-- 1 cp-kafka confluent 10485760 May 19 22:48 00000000000000000000.index -rw-r--r-- 1 cp-kafka confluent 420788365 May 19 22:48 00000000000000000000.log -rw-r--r-- 1 cp-kafka confluent 10485756 May 19 22:48 00000000000000000000.timeindex -rw-r--r-- 1 cp-kafka confluent 8 May 17 11:25 leader-epoch-checkpoint /apps/kafka-data/tracking.ap.client.traffic-13: total 412660 drwxr-xr-x 2 cp-kafka confluent 141 May 17 15:01 . drwxr-xr-x 290 cp-kafka confluent 20480 May 19 22:48 .. -rw-r--r-- 1 cp-kafka confluent 10485760 May 19 22:48 00000000000000000000.index -rw-r--r-- 1 cp-kafka confluent 420606307 May 19 22:48 00000000000000000000.log -rw-r--r-- 1 cp-kafka confluent 10485756 May 19 22:48 00000000000000000000.timeindex -rw-r--r-- 1 cp-kafka confluent 8 May 17 15:01 leader-epoch-checkpoint /apps/kafka-data/tracking.ap.client.traffic-14: total 412920 drwxr-xr-x 2 cp-kafka confluent 141 May 17 11:25 . drwxr-xr-x 290 cp-kafka confluent 20480 May 19 22:48 .. -rw-r--r-- 1 cp-kafka confluent 10485760 May 19 22:48 00000000000000000000.index -rw-r--r-- 1 cp-kafka confluent 420862000 May 19 22:48 00000000000000000000.log -rw-r--r-- 1 cp-kafka confluent 10485756 May 19 22:48 00000000000000000000.timeindex -rw-r--r-- 1 cp-kafka confluent 8 May 17 11:25 leader-epoch-checkpoint ``` ls -la /apps/kafka-data/**tracking.ap.client.traffic.keyed*** ``` /apps/kafka-data/tracking.ap.client.traffic.keyed-1: total 288 drwxr-xr-x 2 cp-kafka confluent 141 May 19 22:47 . drwxr-xr-x 290 cp-kafka confluent 20480 May 19 22:48 .. -rw-r--r-- 1 cp-kafka confluent 10485760 May 19 22:48 00000000000004255431.index -rw-r--r-- 1 cp-kafka confluent 254355 May 19 22:48 00000000000004255431.log -rw-r--r-- 1 cp-kafka confluent 10485756 May 19 22:48 00000000000004255431.timeindex -rw-r--r-- 1 cp-kafka confluent 14 May 19 22:47 leader-epoch-checkpoint /apps/kafka-data/tracking.ap.client.traffic.keyed-2: total 816 drwxr-xr-x 2 cp-kafka confluent 178 May 19 22:46 . drwxr-xr-x 290 cp-kafka confluent 20480 May 19 22:48 .. -rw-r--r-- 1 cp-kafka confluent 10485760 May 19 22:48 00000000000011782064.index -rw-r--r-- 1 cp-kafka confluent 787175 May 19 22:48 00000000000011782064.log -rw-r--r-- 1 cp-kafka confluent 10 May 19 22:45 00000000000011782064.snapshot -rw-r--r-- 1 cp-kafka confluent 10485756 May 19 22:48 00000000000011782064.timeindex -rw-r--r-- 1 cp-kafka confluent 15 May 19 22:45 leader-epoch-checkpoint /apps/kafka-data/tracking.ap.client.traffic.keyed-3: total 1812 drwxr-xr-x 2 cp-kafka confluent 141 May 19 22:47 . drwxr-xr-x 290 cp-kafka confluent 20480 May 19 22:48 .. -rw-r--r-- 1 cp-kafka confluent 10485760 May 19 22:48 00000000000028393394.index -rw-r--r-- 1 cp-kafka confluent 1814528 May 19 22:48 00000000000028393394.log -rw-r--r-- 1 cp-kafka confluent 10485756 May 19 22:48 00000000000028393394.timeindex -rw-r--r-- 1 cp-kafka confluent 15 May 19 22:47 leader-epoch-checkpoint ``` Thank you! Sent with [ProtonMail](https://protonmail.com) Secure Email.