Any idea what is wrong here? I have restarted Kafka brokers a few times, and all other Confluent services like KSQL - but I see exactly the same behavior - one topic gets its logs cleaned up every 5 minutes, while the other one - does not get cleaned up at all ....
Is there anything else I could check, apart from what I already did - see the post below - to troubleshoot this? thank you! Marina Sent with ProtonMail Secure Email. ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Thursday, May 20, 2021 2:10 PM, Marina Popova <ppin...@protonmail.com.INVALID> wrote: > Hi, I have posted this question on SO: > https://stackoverflow.com/questions/67625641/kafka-segments-are-deleted-too-often-or-not-at-all > but wanted to re-post here as well in case someone spots the issue right away > .... > > Thank you for your help! > > > > > > > > > We have two topics on our Kafka cluster that exhibit weird (wrong) behavior > related to retention configuration. > > One topic, tracking.ap.client.traffic, has retention set explicitly to > "retention.ms=14400000" (4 hrs) - but it is not cleaned up , grows in size, > and caused 2 out of 3 kafka brokers to run out of disk space. (details about > config and log below) > > Second topic, tracking.ap.client.traffic.keyed, is created in KSQL as a > stream topic: > > CREATE STREAM AP_CLIENT_TRAFFIC_KEYED > WITH (KAFKA_TOPIC='tracking.ap.client.traffic.keyed', > TIMESTAMP='ACTIVITY_DATE', VALUE_FORMAT='JSON', PARTITIONS=6) > AS SELECT > ... > > > > its retention is set to the default broker value, which is 720 hrs : > > cat /etc/kafka/server.properties | grep retention > log.retention.hours=720 > # A size-based retention policy for logs. Segments are pruned from the > log unless the remaining > # segments drop below log.retention.bytes. Functions independently of > log.retention.hours. > #log.retention.bytes=1073741824 > # to the retention policies > log.retention.check.interval.ms=300000 > > > This topic, though, gets cleaned up every 5 min or so - according to the > logs.... > The log entry says the segment is marked for deletion "due to retention time > 2592000000ms breach (kafka.log.Log)" - but how can that be true if this is > happening every 5 min?? > > No size-based retention is set for any of the topics. > > Two questions: > > 1. why is the first topic not being cleaned p? > 2. why is the second topic being cleaned up so often? > > Below are the details about logs and full config of both topics: > > log entries for tracking.ap.client.traffic.keyed-2 topic/partition - show > that this partition is getting cleaned too often: > > > [2021-05-19 21:35:05,822] INFO [Log > partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] > Incrementing log start offset to 11755700 (kafka.log.Log) > [2021-05-19 21:36:05,822] INFO [Log > partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Deleting > segment 11753910 (kafka.log.Log) > [2021-05-19 21:36:05,825] INFO Deleted log > /apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011753910.log.deleted. > (kafka.log.LogSegment) > [2021-05-19 21:36:05,827] INFO Deleted offset index > /apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011753910.index.deleted. > (kafka.log.LogSegment) > [2021-05-19 21:36:05,829] INFO Deleted time index > /apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011753910.timeindex.deleted. > (kafka.log.LogSegment) > [2021-05-19 21:40:05,838] INFO [Log > partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Found > deletable segments with base offsets [11755700] due to retention time > 2592000000ms breach (kafka.log.Log) > [2021-05-19 21:40:05,843] INFO [ProducerStateManager > partition=tracking.ap.client.traffic.keyed-2] Writing producer snapshot at > offset 11757417 (kafka.log.ProducerStateManager) > [2021-05-19 21:40:05,845] INFO [Log > partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Rolled > new log segment at offset 11757417 in 7 ms. (kafka.log.Log) > [2021-05-19 21:40:05,845] INFO [Log > partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] > Scheduling log segment [baseOffset 11755700, size 936249] for deletion. > (kafka.log.Log) > [2021-05-19 21:40:05,847] INFO [Log > partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] > Incrementing log start offset to 11757417 (kafka.log.Log) > [2021-05-19 21:41:05,848] INFO [Log > partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Deleting > segment 11755700 (kafka.log.Log) > [2021-05-19 21:41:05,850] INFO Deleted log > /apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011755700.log.deleted. > (kafka.log.LogSegment) > [2021-05-19 21:41:05,852] INFO Deleted offset index > /apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011755700.index.deleted. > (kafka.log.LogSegment) > [2021-05-19 21:41:05,853] INFO Deleted time index > /apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011755700.timeindex.deleted. > (kafka.log.LogSegment) > [2021-05-19 21:45:05,834] INFO [Log > partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Found > deletable segments with base offsets [11757417] due to retention time > 2592000000ms breach (kafka.log.Log) > [2021-05-19 21:45:05,837] INFO [ProducerStateManager > partition=tracking.ap.client.traffic.keyed-2] Writing producer snapshot at > offset 11759201 (kafka.log.ProducerStateManager) > [2021-05-19 21:45:05,839] INFO [Log > partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Rolled > new log segment at offset 11759201 in 6 ms. (kafka.log.Log) > [2021-05-19 21:45:05,839] INFO [Log > partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] > Scheduling log segment [baseOffset 11757417, size 977444] for deletion. > (kafka.log.Log) > [2021-05-19 21:45:05,842] INFO [Log > partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] > Incrementing log start offset to 11759201 (kafka.log.Log) > [2021-05-19 21:46:05,842] INFO [Log > partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Deleting > segment 11757417 (kafka.log.Log) > [2021-05-19 21:46:05,844] INFO Deleted log > /apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011757417.log.deleted. > (kafka.log.LogSegment) > [2021-05-19 21:46:05,845] INFO Deleted offset index > /apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011757417.index.deleted. > (kafka.log.LogSegment) > [2021-05-19 21:46:05,847] INFO Deleted time index > /apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011757417.timeindex.deleted. > (kafka.log.LogSegment) > [2021-05-19 21:50:05,832] INFO [Log > partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Found > deletable segments with base offsets [11759201] due to retention time > 2592000000ms breach (kafka.log.Log) > [2021-05-19 21:50:05,837] INFO [ProducerStateManager > partition=tracking.ap.client.traffic.keyed-2] Writing producer snapshot at > offset 11760980 (kafka.log.ProducerStateManager) > [2021-05-19 21:50:05,839] INFO [Log > partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Rolled > new log segment at offset 11760980 in 7 ms. (kafka.log.Log) > [2021-05-19 21:50:05,839] INFO [Log > partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] > Scheduling log segment [baseOffset 11759201, size 991447] for deletion. > (kafka.log.Log) > [2021-05-19 21:50:05,844] INFO [Log > partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] > Incrementing log start offset to 11760980 (kafka.log.Log) > [2021-05-19 21:51:05,844] INFO [Log > partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Deleting > segment 11759201 (kafka.log.Log) > [2021-05-19 21:51:05,846] INFO Deleted log > /apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011759201.log.deleted. > (kafka.log.LogSegment) > [2021-05-19 21:51:05,848] INFO Deleted offset index > /apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011759201.index.deleted. > (kafka.log.LogSegment) > [2021-05-19 21:51:05,849] INFO Deleted time index > /apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011759201.timeindex.deleted. > (kafka.log.LogSegment) > [2021-05-19 21:55:05,828] INFO [Log > partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Found > deletable segments with base offsets [11760980] due to retention time > 2592000000ms breach (kafka.log.Log) > [2021-05-19 21:55:05,833] INFO [ProducerStateManager > partition=tracking.ap.client.traffic.keyed-2] Writing producer snapshot at > offset 11762941 (kafka.log.ProducerStateManager) > [2021-05-19 21:55:05,835] INFO [Log > partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Rolled > new log segment at offset 11762941 in 7 ms. (kafka.log.Log) > [2021-05-19 21:55:05,835] INFO [Log > partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] > Scheduling log segment [baseOffset 11760980, size 1079204] for deletion. > (kafka.log.Log) > [2021-05-19 21:55:05,838] INFO [Log > partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] > Incrementing log start offset to 11762941 (kafka.log.Log) > [2021-05-19 21:56:05,838] INFO [Log > partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Deleting > segment 11760980 (kafka.log.Log) > [2021-05-19 21:56:05,840] INFO Deleted log > /apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011760980.log.deleted. > (kafka.log.LogSegment) > [2021-05-19 21:56:05,842] INFO Deleted offset index > /apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011760980.index.deleted. > (kafka.log.LogSegment) > [2021-05-19 21:56:05,843] INFO Deleted time index > /apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011760980.timeindex.deleted. > (kafka.log.LogSegment) > > > There areno such entries for log cleaning for the first topic, > tracking.ap.client.traffic, which indicates it is never even checked for > hitting retention ?? > > Full config for tracking.ap.client.traffic.keyed topic: (that is cleaned up > too often): > > mac-lt-mpopova:confluent-6.1.0 mpopova$ ./bin/kafka-configs > --bootstrap-server mybroker1:9092 --entity-type topics --entity-name > tracking.ap.client.traffic.keyed --describe --all > > All configs for topic tracking.ap.client.traffic.keyed are: > compression.type=producer sensitive=false > synonyms={DEFAULT_CONFIG:compression.type=producer} > leader.replication.throttled.replicas= sensitive=false synonyms={} > message.downconversion.enable=true sensitive=false > synonyms={DEFAULT_CONFIG:log.message.downconversion.enable=true} > min.insync.replicas=2 sensitive=false > synonyms={STATIC_BROKER_CONFIG:min.insync.replicas=2, > DEFAULT_CONFIG:min.insync.replicas=1} > segment.jitter.ms=0 sensitive=false synonyms={} > cleanup.policy=delete sensitive=false > synonyms={DEFAULT_CONFIG:log.cleanup.policy=delete} > flush.ms=9223372036854775807 sensitive=false synonyms={} > follower.replication.throttled.replicas= sensitive=false synonyms={} > segment.bytes=1073741824 sensitive=false > synonyms={STATIC_BROKER_CONFIG:log.segment.bytes=1073741824, > DEFAULT_CONFIG:log.segment.bytes=1073741824} > retention.ms=2592000000 sensitive=false synonyms={} > flush.messages=9223372036854775807 sensitive=false > synonyms={DEFAULT_CONFIG:log.flush.interval.messages=9223372036854775807} > message.format.version=2.1-IV2 sensitive=false > synonyms={DEFAULT_CONFIG:log.message.format.version=2.1-IV2} > file.delete.delay.ms=60000 sensitive=false > synonyms={DEFAULT_CONFIG:log.segment.delete.delay.ms=60000} > max.message.bytes=1000012 sensitive=false > synonyms={DEFAULT_CONFIG:message.max.bytes=1000012} > min.compaction.lag.ms=0 sensitive=false > synonyms={DEFAULT_CONFIG:log.cleaner.min.compaction.lag.ms=0} > message.timestamp.type=CreateTime sensitive=false > synonyms={DEFAULT_CONFIG:log.message.timestamp.type=CreateTime} > preallocate=false sensitive=false > synonyms={DEFAULT_CONFIG:log.preallocate=false} > min.cleanable.dirty.ratio=0.5 sensitive=false > synonyms={DEFAULT_CONFIG:log.cleaner.min.cleanable.ratio=0.5} > index.interval.bytes=4096 sensitive=false > synonyms={DEFAULT_CONFIG:log.index.interval.bytes=4096} > unclean.leader.election.enable=false sensitive=false > synonyms={STATIC_BROKER_CONFIG:unclean.leader.election.enable=false, > DEFAULT_CONFIG:unclean.leader.election.enable=false} > retention.bytes=-1 sensitive=false > synonyms={DEFAULT_CONFIG:log.retention.bytes=-1} > delete.retention.ms=86400000 sensitive=false > synonyms={DEFAULT_CONFIG:log.cleaner.delete.retention.ms=86400000} > segment.ms=604800000 sensitive=false synonyms={} > message.timestamp.difference.max.ms=9223372036854775807 sensitive=false > synonyms={DEFAULT_CONFIG:log.message.timestamp.difference.max.ms=9223372036854775807} > segment.index.bytes=10485760 sensitive=false > synonyms={DEFAULT_CONFIG:log.index.size.max.bytes=10485760} > > > Full config fortracking.ap.client.traffic topic: (not being cleaned up at > all): > > mac-lt-mpopova:confluent-6.1.0 mpopova$ ./bin/kafka-configs > --bootstrap-server mybroker1:9092 --entity-type topics --entity-name > tracking.ap.client.traffic --describe --all > > All configs for topic tracking.ap.client.traffic are: > compression.type=producer sensitive=false > synonyms={DEFAULT_CONFIG:compression.type=producer} > leader.replication.throttled.replicas= sensitive=false synonyms={} > message.downconversion.enable=true sensitive=false > synonyms={DEFAULT_CONFIG:log.message.downconversion.enable=true} > min.insync.replicas=2 sensitive=false > synonyms={STATIC_BROKER_CONFIG:min.insync.replicas=2, > DEFAULT_CONFIG:min.insync.replicas=1} > segment.jitter.ms=0 sensitive=false synonyms={} > cleanup.policy=delete sensitive=false > synonyms={DEFAULT_CONFIG:log.cleanup.policy=delete} > flush.ms=9223372036854775807 sensitive=false synonyms={} > follower.replication.throttled.replicas= sensitive=false synonyms={} > segment.bytes=1073741824 sensitive=false > synonyms={STATIC_BROKER_CONFIG:log.segment.bytes=1073741824, > DEFAULT_CONFIG:log.segment.bytes=1073741824} > retention.ms=14400000 sensitive=false > synonyms={DYNAMIC_TOPIC_CONFIG:retention.ms=14400000} > flush.messages=9223372036854775807 sensitive=false > synonyms={DEFAULT_CONFIG:log.flush.interval.messages=9223372036854775807} > message.format.version=2.1-IV2 sensitive=false > synonyms={DEFAULT_CONFIG:log.message.format.version=2.1-IV2} > file.delete.delay.ms=60000 sensitive=false > synonyms={DEFAULT_CONFIG:log.segment.delete.delay.ms=60000} > max.message.bytes=1000012 sensitive=false > synonyms={DEFAULT_CONFIG:message.max.bytes=1000012} > min.compaction.lag.ms=0 sensitive=false > synonyms={DEFAULT_CONFIG:log.cleaner.min.compaction.lag.ms=0} > message.timestamp.type=CreateTime sensitive=false > synonyms={DEFAULT_CONFIG:log.message.timestamp.type=CreateTime} > preallocate=false sensitive=false > synonyms={DEFAULT_CONFIG:log.preallocate=false} > min.cleanable.dirty.ratio=0.5 sensitive=false > synonyms={DEFAULT_CONFIG:log.cleaner.min.cleanable.ratio=0.5} > index.interval.bytes=4096 sensitive=false > synonyms={DEFAULT_CONFIG:log.index.interval.bytes=4096} > unclean.leader.election.enable=false sensitive=false > synonyms={STATIC_BROKER_CONFIG:unclean.leader.election.enable=false, > DEFAULT_CONFIG:unclean.leader.election.enable=false} > retention.bytes=-1 sensitive=false > synonyms={DEFAULT_CONFIG:log.retention.bytes=-1} > delete.retention.ms=86400000 sensitive=false > synonyms={DEFAULT_CONFIG:log.cleaner.delete.retention.ms=86400000} > segment.ms=604800000 sensitive=false synonyms={} > message.timestamp.difference.max.ms=9223372036854775807 sensitive=false > synonyms={DEFAULT_CONFIG:log.message.timestamp.difference.max.ms=9223372036854775807} > segment.index.bytes=10485760 sensitive=false > synonyms={DEFAULT_CONFIG:log.index.size.max.bytes=10485760} > > > Here is an example directory content of the logs for both topics: > > ls -la /apps/kafka-data/tracking.ap.client.traffic-* > > /apps/kafka-data/tracking.ap.client.traffic-1: > total 412640 > drwxr-xr-x 2 cp-kafka confluent 141 May 17 15:01 . > drwxr-xr-x 290 cp-kafka confluent 20480 May 19 22:48 .. > -rw-r--r-- 1 cp-kafka confluent 10485760 May 19 22:48 > 00000000000000000000.index > -rw-r--r-- 1 cp-kafka confluent 420599482 May 19 22:48 > 00000000000000000000.log > -rw-r--r-- 1 cp-kafka confluent 10485756 May 19 22:48 > 00000000000000000000.timeindex > -rw-r--r-- 1 cp-kafka confluent 8 May 17 15:01 leader-epoch-checkpoint > > /apps/kafka-data/tracking.ap.client.traffic-11: > total 412844 > drwxr-xr-x 2 cp-kafka confluent 141 May 17 11:25 . > drwxr-xr-x 290 cp-kafka confluent 20480 May 19 22:48 .. > -rw-r--r-- 1 cp-kafka confluent 10485760 May 19 22:48 > 00000000000000000000.index > -rw-r--r-- 1 cp-kafka confluent 420788365 May 19 22:48 > 00000000000000000000.log > -rw-r--r-- 1 cp-kafka confluent 10485756 May 19 22:48 > 00000000000000000000.timeindex > -rw-r--r-- 1 cp-kafka confluent 8 May 17 11:25 leader-epoch-checkpoint > > /apps/kafka-data/tracking.ap.client.traffic-13: > total 412660 > drwxr-xr-x 2 cp-kafka confluent 141 May 17 15:01 . > drwxr-xr-x 290 cp-kafka confluent 20480 May 19 22:48 .. > -rw-r--r-- 1 cp-kafka confluent 10485760 May 19 22:48 > 00000000000000000000.index > -rw-r--r-- 1 cp-kafka confluent 420606307 May 19 22:48 > 00000000000000000000.log > -rw-r--r-- 1 cp-kafka confluent 10485756 May 19 22:48 > 00000000000000000000.timeindex > -rw-r--r-- 1 cp-kafka confluent 8 May 17 15:01 leader-epoch-checkpoint > > /apps/kafka-data/tracking.ap.client.traffic-14: > total 412920 > drwxr-xr-x 2 cp-kafka confluent 141 May 17 11:25 . > drwxr-xr-x 290 cp-kafka confluent 20480 May 19 22:48 .. > -rw-r--r-- 1 cp-kafka confluent 10485760 May 19 22:48 > 00000000000000000000.index > -rw-r--r-- 1 cp-kafka confluent 420862000 May 19 22:48 > 00000000000000000000.log > -rw-r--r-- 1 cp-kafka confluent 10485756 May 19 22:48 > 00000000000000000000.timeindex > -rw-r--r-- 1 cp-kafka confluent 8 May 17 11:25 leader-epoch-checkpoint > > > ls -la /apps/kafka-data/tracking.ap.client.traffic.keyed* > > /apps/kafka-data/tracking.ap.client.traffic.keyed-1: > total 288 > drwxr-xr-x 2 cp-kafka confluent 141 May 19 22:47 . > drwxr-xr-x 290 cp-kafka confluent 20480 May 19 22:48 .. > -rw-r--r-- 1 cp-kafka confluent 10485760 May 19 22:48 > 00000000000004255431.index > -rw-r--r-- 1 cp-kafka confluent 254355 May 19 22:48 > 00000000000004255431.log > -rw-r--r-- 1 cp-kafka confluent 10485756 May 19 22:48 > 00000000000004255431.timeindex > -rw-r--r-- 1 cp-kafka confluent 14 May 19 22:47 leader-epoch-checkpoint > > /apps/kafka-data/tracking.ap.client.traffic.keyed-2: > total 816 > drwxr-xr-x 2 cp-kafka confluent 178 May 19 22:46 . > drwxr-xr-x 290 cp-kafka confluent 20480 May 19 22:48 .. > -rw-r--r-- 1 cp-kafka confluent 10485760 May 19 22:48 > 00000000000011782064.index > -rw-r--r-- 1 cp-kafka confluent 787175 May 19 22:48 > 00000000000011782064.log > -rw-r--r-- 1 cp-kafka confluent 10 May 19 22:45 > 00000000000011782064.snapshot > -rw-r--r-- 1 cp-kafka confluent 10485756 May 19 22:48 > 00000000000011782064.timeindex > -rw-r--r-- 1 cp-kafka confluent 15 May 19 22:45 leader-epoch-checkpoint > > /apps/kafka-data/tracking.ap.client.traffic.keyed-3: > total 1812 > drwxr-xr-x 2 cp-kafka confluent 141 May 19 22:47 . > drwxr-xr-x 290 cp-kafka confluent 20480 May 19 22:48 .. > -rw-r--r-- 1 cp-kafka confluent 10485760 May 19 22:48 > 00000000000028393394.index > -rw-r--r-- 1 cp-kafka confluent 1814528 May 19 22:48 > 00000000000028393394.log > -rw-r--r-- 1 cp-kafka confluent 10485756 May 19 22:48 > 00000000000028393394.timeindex > -rw-r--r-- 1 cp-kafka confluent 15 May 19 22:47 leader-epoch-checkpoint > > > Thank you! > > Sent with ProtonMail Secure Email.