Hi, I have posted this question on SO:
https://stackoverflow.com/questions/67625641/kafka-segments-are-deleted-too-often-or-not-at-all
but wanted to re-post here as well in case someone spots the issue right away 
....

Thank you for your help!

>>>>>
We have two topics on our Kafka cluster that exhibit weird (wrong) behavior 
related to retention configuration.

One topic, **tracking.ap.client.traffic**, has retention set explicitly to 
"retention.ms=14400000" (4 hrs) - but it is not cleaned up , grows in size, and 
caused 2 out of 3 kafka brokers to run out of disk space. (details about config 
and log below)

Second topic, **tracking.ap.client.traffic.keyed**, is created in KSQL as a 
stream topic:
```
CREATE STREAM AP_CLIENT_TRAFFIC_KEYED
WITH (KAFKA_TOPIC='tracking.ap.client.traffic.keyed', 
TIMESTAMP='ACTIVITY_DATE', VALUE_FORMAT='JSON', PARTITIONS=6)
AS SELECT
...

```
its retention is set to the default broker value, which is 720 hrs :

```
cat /etc/kafka/server.properties | grep retention
log.retention.hours=720
# A size-based retention policy for logs. Segments are pruned from the log 
unless the remaining
# segments drop below log.retention.bytes. Functions independently of 
log.retention.hours.
#log.retention.bytes=1073741824
# to the retention policies
log.retention.check.interval.ms=300000
```

This topic, though, gets cleaned up every 5 min or so - according to the 
logs....
The log entry says the segment is marked for deletion "due to retention time 
2592000000ms breach (kafka.log.Log)" - but how can that be true if this is 
happening every 5 min??

No size-based retention is set for any of the topics.

Two questions:
1. why is the first topic not being cleaned p?
2. why is the second topic being cleaned up so often?

Below are the details about logs and full config of both topics:

log entries for **tracking.ap.client.traffic.keyed**-2 topic/partition - show 
that this partition is getting cleaned too often:

```
[2021-05-19 21:35:05,822] INFO [Log 
partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] 
Incrementing log start offset to 11755700 (kafka.log.Log)
[2021-05-19 21:36:05,822] INFO [Log 
partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Deleting 
segment 11753910 (kafka.log.Log)
[2021-05-19 21:36:05,825] INFO Deleted log 
/apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011753910.log.deleted.
 (kafka.log.LogSegment)
[2021-05-19 21:36:05,827] INFO Deleted offset index 
/apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011753910.index.deleted.
 (kafka.log.LogSegment)
[2021-05-19 21:36:05,829] INFO Deleted time index 
/apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011753910.timeindex.deleted.
 (kafka.log.LogSegment)
[2021-05-19 21:40:05,838] INFO [Log 
partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Found 
deletable segments with base offsets [11755700] due to retention time 
2592000000ms breach (kafka.log.Log)
[2021-05-19 21:40:05,843] INFO [ProducerStateManager 
partition=tracking.ap.client.traffic.keyed-2] Writing producer snapshot at 
offset 11757417 (kafka.log.ProducerStateManager)
[2021-05-19 21:40:05,845] INFO [Log 
partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Rolled new 
log segment at offset 11757417 in 7 ms. (kafka.log.Log)
[2021-05-19 21:40:05,845] INFO [Log 
partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Scheduling 
log segment [baseOffset 11755700, size 936249] for deletion. (kafka.log.Log)
[2021-05-19 21:40:05,847] INFO [Log 
partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] 
Incrementing log start offset to 11757417 (kafka.log.Log)
[2021-05-19 21:41:05,848] INFO [Log 
partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Deleting 
segment 11755700 (kafka.log.Log)
[2021-05-19 21:41:05,850] INFO Deleted log 
/apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011755700.log.deleted.
 (kafka.log.LogSegment)
[2021-05-19 21:41:05,852] INFO Deleted offset index 
/apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011755700.index.deleted.
 (kafka.log.LogSegment)
[2021-05-19 21:41:05,853] INFO Deleted time index 
/apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011755700.timeindex.deleted.
 (kafka.log.LogSegment)
[2021-05-19 21:45:05,834] INFO [Log 
partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Found 
deletable segments with base offsets [11757417] due to retention time 
2592000000ms breach (kafka.log.Log)
[2021-05-19 21:45:05,837] INFO [ProducerStateManager 
partition=tracking.ap.client.traffic.keyed-2] Writing producer snapshot at 
offset 11759201 (kafka.log.ProducerStateManager)
[2021-05-19 21:45:05,839] INFO [Log 
partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Rolled new 
log segment at offset 11759201 in 6 ms. (kafka.log.Log)
[2021-05-19 21:45:05,839] INFO [Log 
partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Scheduling 
log segment [baseOffset 11757417, size 977444] for deletion. (kafka.log.Log)
[2021-05-19 21:45:05,842] INFO [Log 
partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] 
Incrementing log start offset to 11759201 (kafka.log.Log)
[2021-05-19 21:46:05,842] INFO [Log 
partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Deleting 
segment 11757417 (kafka.log.Log)
[2021-05-19 21:46:05,844] INFO Deleted log 
/apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011757417.log.deleted.
 (kafka.log.LogSegment)
[2021-05-19 21:46:05,845] INFO Deleted offset index 
/apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011757417.index.deleted.
 (kafka.log.LogSegment)
[2021-05-19 21:46:05,847] INFO Deleted time index 
/apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011757417.timeindex.deleted.
 (kafka.log.LogSegment)
[2021-05-19 21:50:05,832] INFO [Log 
partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Found 
deletable segments with base offsets [11759201] due to retention time 
2592000000ms breach (kafka.log.Log)
[2021-05-19 21:50:05,837] INFO [ProducerStateManager 
partition=tracking.ap.client.traffic.keyed-2] Writing producer snapshot at 
offset 11760980 (kafka.log.ProducerStateManager)
[2021-05-19 21:50:05,839] INFO [Log 
partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Rolled new 
log segment at offset 11760980 in 7 ms. (kafka.log.Log)
[2021-05-19 21:50:05,839] INFO [Log 
partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Scheduling 
log segment [baseOffset 11759201, size 991447] for deletion. (kafka.log.Log)
[2021-05-19 21:50:05,844] INFO [Log 
partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] 
Incrementing log start offset to 11760980 (kafka.log.Log)
[2021-05-19 21:51:05,844] INFO [Log 
partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Deleting 
segment 11759201 (kafka.log.Log)
[2021-05-19 21:51:05,846] INFO Deleted log 
/apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011759201.log.deleted.
 (kafka.log.LogSegment)
[2021-05-19 21:51:05,848] INFO Deleted offset index 
/apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011759201.index.deleted.
 (kafka.log.LogSegment)
[2021-05-19 21:51:05,849] INFO Deleted time index 
/apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011759201.timeindex.deleted.
 (kafka.log.LogSegment)
[2021-05-19 21:55:05,828] INFO [Log 
partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Found 
deletable segments with base offsets [11760980] due to retention time 
2592000000ms breach (kafka.log.Log)
[2021-05-19 21:55:05,833] INFO [ProducerStateManager 
partition=tracking.ap.client.traffic.keyed-2] Writing producer snapshot at 
offset 11762941 (kafka.log.ProducerStateManager)
[2021-05-19 21:55:05,835] INFO [Log 
partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Rolled new 
log segment at offset 11762941 in 7 ms. (kafka.log.Log)
[2021-05-19 21:55:05,835] INFO [Log 
partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Scheduling 
log segment [baseOffset 11760980, size 1079204] for deletion. (kafka.log.Log)
[2021-05-19 21:55:05,838] INFO [Log 
partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] 
Incrementing log start offset to 11762941 (kafka.log.Log)
[2021-05-19 21:56:05,838] INFO [Log 
partition=tracking.ap.client.traffic.keyed-2, dir=/apps/kafka-data] Deleting 
segment 11760980 (kafka.log.Log)
[2021-05-19 21:56:05,840] INFO Deleted log 
/apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011760980.log.deleted.
 (kafka.log.LogSegment)
[2021-05-19 21:56:05,842] INFO Deleted offset index 
/apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011760980.index.deleted.
 (kafka.log.LogSegment)
[2021-05-19 21:56:05,843] INFO Deleted time index 
/apps/kafka-data/tracking.ap.client.traffic.keyed-2/00000000000011760980.timeindex.deleted.
 (kafka.log.LogSegment)
```

There are **no such entries** for log cleaning for the first topic, 
tracking.ap.client.traffic, which indicates it is never even checked for 
hitting retention ??

Full config for **tracking.ap.client.traffic.keyed** topic: (that is cleaned up 
too often):

```
mac-lt-mpopova:confluent-6.1.0 mpopova$ ./bin/kafka-configs --bootstrap-server 
mybroker1:9092 --entity-type topics --entity-name 
tracking.ap.client.traffic.keyed --describe --all

All configs for topic tracking.ap.client.traffic.keyed are:
compression.type=producer sensitive=false 
synonyms={DEFAULT_CONFIG:compression.type=producer}
leader.replication.throttled.replicas= sensitive=false synonyms={}
message.downconversion.enable=true sensitive=false 
synonyms={DEFAULT_CONFIG:log.message.downconversion.enable=true}
min.insync.replicas=2 sensitive=false 
synonyms={STATIC_BROKER_CONFIG:min.insync.replicas=2, 
DEFAULT_CONFIG:min.insync.replicas=1}
segment.jitter.ms=0 sensitive=false synonyms={}
cleanup.policy=delete sensitive=false 
synonyms={DEFAULT_CONFIG:log.cleanup.policy=delete}
flush.ms=9223372036854775807 sensitive=false synonyms={}
follower.replication.throttled.replicas= sensitive=false synonyms={}
segment.bytes=1073741824 sensitive=false 
synonyms={STATIC_BROKER_CONFIG:log.segment.bytes=1073741824, 
DEFAULT_CONFIG:log.segment.bytes=1073741824}
retention.ms=2592000000 sensitive=false synonyms={}
flush.messages=9223372036854775807 sensitive=false 
synonyms={DEFAULT_CONFIG:log.flush.interval.messages=9223372036854775807}
message.format.version=2.1-IV2 sensitive=false 
synonyms={DEFAULT_CONFIG:log.message.format.version=2.1-IV2}
file.delete.delay.ms=60000 sensitive=false 
synonyms={DEFAULT_CONFIG:log.segment.delete.delay.ms=60000}
max.message.bytes=1000012 sensitive=false 
synonyms={DEFAULT_CONFIG:message.max.bytes=1000012}
min.compaction.lag.ms=0 sensitive=false 
synonyms={DEFAULT_CONFIG:log.cleaner.min.compaction.lag.ms=0}
message.timestamp.type=CreateTime sensitive=false 
synonyms={DEFAULT_CONFIG:log.message.timestamp.type=CreateTime}
preallocate=false sensitive=false 
synonyms={DEFAULT_CONFIG:log.preallocate=false}
min.cleanable.dirty.ratio=0.5 sensitive=false 
synonyms={DEFAULT_CONFIG:log.cleaner.min.cleanable.ratio=0.5}
index.interval.bytes=4096 sensitive=false 
synonyms={DEFAULT_CONFIG:log.index.interval.bytes=4096}
unclean.leader.election.enable=false sensitive=false 
synonyms={STATIC_BROKER_CONFIG:unclean.leader.election.enable=false, 
DEFAULT_CONFIG:unclean.leader.election.enable=false}
retention.bytes=-1 sensitive=false 
synonyms={DEFAULT_CONFIG:log.retention.bytes=-1}
delete.retention.ms=86400000 sensitive=false 
synonyms={DEFAULT_CONFIG:log.cleaner.delete.retention.ms=86400000}
segment.ms=604800000 sensitive=false synonyms={}
message.timestamp.difference.max.ms=9223372036854775807 sensitive=false 
synonyms={DEFAULT_CONFIG:log.message.timestamp.difference.max.ms=9223372036854775807}
segment.index.bytes=10485760 sensitive=false 
synonyms={DEFAULT_CONFIG:log.index.size.max.bytes=10485760}
```

Full config for **tracking.ap.client.traffic** topic: (not being cleaned up at 
all):
```
mac-lt-mpopova:confluent-6.1.0 mpopova$ ./bin/kafka-configs --bootstrap-server 
mybroker1:9092 --entity-type topics --entity-name tracking.ap.client.traffic 
--describe --all

All configs for topic tracking.ap.client.traffic are:
compression.type=producer sensitive=false 
synonyms={DEFAULT_CONFIG:compression.type=producer}
leader.replication.throttled.replicas= sensitive=false synonyms={}
message.downconversion.enable=true sensitive=false 
synonyms={DEFAULT_CONFIG:log.message.downconversion.enable=true}
min.insync.replicas=2 sensitive=false 
synonyms={STATIC_BROKER_CONFIG:min.insync.replicas=2, 
DEFAULT_CONFIG:min.insync.replicas=1}
segment.jitter.ms=0 sensitive=false synonyms={}
cleanup.policy=delete sensitive=false 
synonyms={DEFAULT_CONFIG:log.cleanup.policy=delete}
flush.ms=9223372036854775807 sensitive=false synonyms={}
follower.replication.throttled.replicas= sensitive=false synonyms={}
segment.bytes=1073741824 sensitive=false 
synonyms={STATIC_BROKER_CONFIG:log.segment.bytes=1073741824, 
DEFAULT_CONFIG:log.segment.bytes=1073741824}
retention.ms=14400000 sensitive=false 
synonyms={DYNAMIC_TOPIC_CONFIG:retention.ms=14400000}
flush.messages=9223372036854775807 sensitive=false 
synonyms={DEFAULT_CONFIG:log.flush.interval.messages=9223372036854775807}
message.format.version=2.1-IV2 sensitive=false 
synonyms={DEFAULT_CONFIG:log.message.format.version=2.1-IV2}
file.delete.delay.ms=60000 sensitive=false 
synonyms={DEFAULT_CONFIG:log.segment.delete.delay.ms=60000}
max.message.bytes=1000012 sensitive=false 
synonyms={DEFAULT_CONFIG:message.max.bytes=1000012}
min.compaction.lag.ms=0 sensitive=false 
synonyms={DEFAULT_CONFIG:log.cleaner.min.compaction.lag.ms=0}
message.timestamp.type=CreateTime sensitive=false 
synonyms={DEFAULT_CONFIG:log.message.timestamp.type=CreateTime}
preallocate=false sensitive=false 
synonyms={DEFAULT_CONFIG:log.preallocate=false}
min.cleanable.dirty.ratio=0.5 sensitive=false 
synonyms={DEFAULT_CONFIG:log.cleaner.min.cleanable.ratio=0.5}
index.interval.bytes=4096 sensitive=false 
synonyms={DEFAULT_CONFIG:log.index.interval.bytes=4096}
unclean.leader.election.enable=false sensitive=false 
synonyms={STATIC_BROKER_CONFIG:unclean.leader.election.enable=false, 
DEFAULT_CONFIG:unclean.leader.election.enable=false}
retention.bytes=-1 sensitive=false 
synonyms={DEFAULT_CONFIG:log.retention.bytes=-1}
delete.retention.ms=86400000 sensitive=false 
synonyms={DEFAULT_CONFIG:log.cleaner.delete.retention.ms=86400000}
segment.ms=604800000 sensitive=false synonyms={}
message.timestamp.difference.max.ms=9223372036854775807 sensitive=false 
synonyms={DEFAULT_CONFIG:log.message.timestamp.difference.max.ms=9223372036854775807}
segment.index.bytes=10485760 sensitive=false 
synonyms={DEFAULT_CONFIG:log.index.size.max.bytes=10485760}
```

Here is an example directory content of the logs for both topics:

ls -la /apps/kafka-data/**tracking.ap.client.traffic**-*
```
/apps/kafka-data/tracking.ap.client.traffic-1:
total 412640
drwxr-xr-x 2 cp-kafka confluent 141 May 17 15:01 .
drwxr-xr-x 290 cp-kafka confluent 20480 May 19 22:48 ..
-rw-r--r-- 1 cp-kafka confluent 10485760 May 19 22:48 00000000000000000000.index
-rw-r--r-- 1 cp-kafka confluent 420599482 May 19 22:48 00000000000000000000.log
-rw-r--r-- 1 cp-kafka confluent 10485756 May 19 22:48 
00000000000000000000.timeindex
-rw-r--r-- 1 cp-kafka confluent 8 May 17 15:01 leader-epoch-checkpoint

/apps/kafka-data/tracking.ap.client.traffic-11:
total 412844
drwxr-xr-x 2 cp-kafka confluent 141 May 17 11:25 .
drwxr-xr-x 290 cp-kafka confluent 20480 May 19 22:48 ..
-rw-r--r-- 1 cp-kafka confluent 10485760 May 19 22:48 00000000000000000000.index
-rw-r--r-- 1 cp-kafka confluent 420788365 May 19 22:48 00000000000000000000.log
-rw-r--r-- 1 cp-kafka confluent 10485756 May 19 22:48 
00000000000000000000.timeindex
-rw-r--r-- 1 cp-kafka confluent 8 May 17 11:25 leader-epoch-checkpoint

/apps/kafka-data/tracking.ap.client.traffic-13:
total 412660
drwxr-xr-x 2 cp-kafka confluent 141 May 17 15:01 .
drwxr-xr-x 290 cp-kafka confluent 20480 May 19 22:48 ..
-rw-r--r-- 1 cp-kafka confluent 10485760 May 19 22:48 00000000000000000000.index
-rw-r--r-- 1 cp-kafka confluent 420606307 May 19 22:48 00000000000000000000.log
-rw-r--r-- 1 cp-kafka confluent 10485756 May 19 22:48 
00000000000000000000.timeindex
-rw-r--r-- 1 cp-kafka confluent 8 May 17 15:01 leader-epoch-checkpoint

/apps/kafka-data/tracking.ap.client.traffic-14:
total 412920
drwxr-xr-x 2 cp-kafka confluent 141 May 17 11:25 .
drwxr-xr-x 290 cp-kafka confluent 20480 May 19 22:48 ..
-rw-r--r-- 1 cp-kafka confluent 10485760 May 19 22:48 00000000000000000000.index
-rw-r--r-- 1 cp-kafka confluent 420862000 May 19 22:48 00000000000000000000.log
-rw-r--r-- 1 cp-kafka confluent 10485756 May 19 22:48 
00000000000000000000.timeindex
-rw-r--r-- 1 cp-kafka confluent 8 May 17 11:25 leader-epoch-checkpoint
```

ls -la /apps/kafka-data/**tracking.ap.client.traffic.keyed***
```
/apps/kafka-data/tracking.ap.client.traffic.keyed-1:
total 288
drwxr-xr-x 2 cp-kafka confluent 141 May 19 22:47 .
drwxr-xr-x 290 cp-kafka confluent 20480 May 19 22:48 ..
-rw-r--r-- 1 cp-kafka confluent 10485760 May 19 22:48 00000000000004255431.index
-rw-r--r-- 1 cp-kafka confluent 254355 May 19 22:48 00000000000004255431.log
-rw-r--r-- 1 cp-kafka confluent 10485756 May 19 22:48 
00000000000004255431.timeindex
-rw-r--r-- 1 cp-kafka confluent 14 May 19 22:47 leader-epoch-checkpoint

/apps/kafka-data/tracking.ap.client.traffic.keyed-2:
total 816
drwxr-xr-x 2 cp-kafka confluent 178 May 19 22:46 .
drwxr-xr-x 290 cp-kafka confluent 20480 May 19 22:48 ..
-rw-r--r-- 1 cp-kafka confluent 10485760 May 19 22:48 00000000000011782064.index
-rw-r--r-- 1 cp-kafka confluent 787175 May 19 22:48 00000000000011782064.log
-rw-r--r-- 1 cp-kafka confluent 10 May 19 22:45 00000000000011782064.snapshot
-rw-r--r-- 1 cp-kafka confluent 10485756 May 19 22:48 
00000000000011782064.timeindex
-rw-r--r-- 1 cp-kafka confluent 15 May 19 22:45 leader-epoch-checkpoint

/apps/kafka-data/tracking.ap.client.traffic.keyed-3:
total 1812
drwxr-xr-x 2 cp-kafka confluent 141 May 19 22:47 .
drwxr-xr-x 290 cp-kafka confluent 20480 May 19 22:48 ..
-rw-r--r-- 1 cp-kafka confluent 10485760 May 19 22:48 00000000000028393394.index
-rw-r--r-- 1 cp-kafka confluent 1814528 May 19 22:48 00000000000028393394.log
-rw-r--r-- 1 cp-kafka confluent 10485756 May 19 22:48 
00000000000028393394.timeindex
-rw-r--r-- 1 cp-kafka confluent 15 May 19 22:47 leader-epoch-checkpoint
```

Thank you!

Sent with [ProtonMail](https://protonmail.com) Secure Email.

Reply via email to