Thanks Christopher, Hrm, I checked the logs and retention.ms has been enabled somewhere around October last year.
{"log":"[2019-10-20 03:01:41,818] INFO Processing override for entityPath: topics/ETD-TEST with config: Map(retention.ms -\u003e 1210000000) (kafka.server.DynamicConfigManager)\n","stream":"stdout","time":"2019-10-20T03:01:41.818402555Z"} So don't think the first hypothesis of it being turned on recently sticks. I'll have to investigate the other possibility you mentioned but we only have 46 topics with roughly 4 million events total. I'll continue to dig but thank you for your thoughts and chiming in! On Sun, May 24, 2020 at 8:10 PM Christopher Smith <cbsm...@gmail.com> wrote: > It's a bit weird having the default policy for your brokers being compact, > but yes, the policy for the topic overrides the broker policy. > > What you are seeing on with your example of the segment files makes sense > to me. With a 1210000000ms retention policy you've actually got 400 > seconds beyond 14 days. The default 7 day segment rotation window, along > with a low data rate to ensure you don't hit segment.bytes limits, ensures > that after 21 days, you'd have three segments, each of which would still > potentially have data within the retention window, so there would be no > deletion. So seeing May 9th data on May 21st seems reasonable to me. > > As to why you have messages going back 8 months on a topic with > retention.ms=1210000000, > I can only theorize. One case I could imagine would be if you had the > cleanup policy initially as compact, then it might have been doing > compaction for eight+ months. If so, you might have records in compacted & > merged segments that go back to the ends of time. So, if switched to a > "delete" policy recently, as long as those merged segments have even one > message in them from < 1210000000 ms ago, the delete policy will still not > consider it eligible for deletion. > > The other possibility is that the cleaner just hasn't gotten around to > deleting those segments. Given that you have "compact" as your default > policy, you might have a lot of topics that are compacted and it's busy > churning through those. > > --Chris > > On Sun, May 24, 2020 at 2:46 PM Don Ky <asandy...@gmail.com> wrote: > > > Hello all, any thoughts on this? Just pinging if anyone out there has any > > pointers on where to go from here? I was wondering if it could be that > the > > topic is actually being compacted and not a retention policy of delete? > > Could this be because that the topics were initially created with a > compact > > policy and now the delete policy isn't applicable? > > > > Thanks! > > > > On Thu, May 21, 2020 at 12:49 PM Don Ky <asandy...@gmail.com> wrote: > > > > > Hello Kafka users! I'm posting because we are having an issue with a > > topic > > > retention that we don't understand and need input on what could > > potentially > > > be wrong on our end if it's indeed is a configuration issue.¬ > > > First we are running a containerized version of Kafka with the > following > > > environment variables.¬ > > > ¬ > > > CONFLUENT_MINOR_VERSION=3¬ > > > KAFKA_LOG_CLEANUP_POLICY=compact¬ > > > KAFKA_LOG4J_LOGGERS=kafka.controller=ERROR¬ > > > CONFLUENT_DEB_VERSION=1¬ > > > KAFKA_VERSION=5.3.1¬ > > > CONFLUENT_PATCH_VERSION=1¬ > > > CONFLUENT_VERSION=5.3.1¬ > > > KAFKA_LOG_RETENTION_BYTES=-1¬ > > > PYTHON_VERSION=2.7.9-1¬ > > > SCALA_VERSION=2.12¬ > > > CONFLUENT_MAJOR_VERSION=5¬ > > > KAFKA_NUM_PARTITIONS=4¬ > > > ZULU_OPENJDK_VERSION=8=8.38.0.13¬ > > > KAFKA_DEFAULT_REPLICATION_FACTOR=6¬ > > > KAFKA_LOG_RETENTION_HOURS=2147483647¬ > > > ¬ > > > I'm listing these vars because they might be helpful for someone with a > > > keen eye to spot something we are missing. So based on this we should > > have > > > a global log clean up/retention of:¬ > > > ¬ > > > KAFKA_LOG_CLEANUP_POLICY of compact~¬ > > > KAFKA_LOG_RETENTION_HOURS=2147483647 = 214 years~¬ > > > KAFKA_LOG_RETENTION_BYTES=-1~¬ > > > ¬ > > > We currently have a topic called "ETD-TEST" shown below:¬ > > > ¬ > > > kafka-topics --describe --zookeeper $PRD:2181 --topic ETD-TEST¬ > > > Topic: ETD-TEST▸--PartitionCount: 4▸ReplicationFactor: 6▸Configs: > > > cleanup.policy=delete,retention.ms=1210000000¬ > > > ▸--Topic: ETD-TEST▸--Partition: 0▸--Leader: 1▸--Replicas: > > > 1,4,5,2,3,6▸--Isr: 5,1,6,2,3,4¬ > > > ▸--Topic: ETD-TEST▸--Partition: 1▸--Leader: 1▸--Replicas: > > > 1,2,3,6,5,4▸--Isr: 5,1,6,2,3,4¬ > > > ▸--Topic: ETD-TEST▸--Partition: 2▸--Leader: 4▸--Replicas: > > > 4,5,1,6,2,3▸--Isr: 5,1,6,2,3,4¬ > > > ▸--Topic: ETD-TEST▸--Partition: 3▸--Leader: 1▸--Replicas: > > > 1,3,6,4,2,5▸--Isr: 5,1,6,2,3,4¬ > > > ¬ > > > Notice it has a clean up policy of delete which should override the > > global > > > setting of compact?¬ > > > ¬ > > > The issue is ETD-TEST has a topic level retention.ms value of > 1210000000 > > > (14 days) but we have messages further in the topic dating much further > > > back. Somewhere around 8 months. We only ran into this and were > surprised > > > when we started consuming the messages and saw close to 2 million > > > messages.~¬ > > > ¬ > > > From the reading and research I've done in the last few days > > retention.ms > > > isn't necessarily how old messages/events in the topic should be but a > > > value of how long the segment files in the topic are kept around before > > > getting rotated out. (Please correct if this is false)¬ > > > ¬ > > > I learned that the segment files rotated also have a size + duration > > > value. The defaults being 1G for segment size (if a segment reaches > this > > > size it gets rotated similar to logs. The segment.ms determines how > old > > a > > > message should be also before it gets rotated the default being 7 days. > > So > > > if a log segment never reaches size it will eventually hit the > > segment.ms > > > of 7 days and be rotated.¬ > > > ¬ > > > The weird thing is this seems to only affect this one particular topic, > > > I've created a separate topic and flooded it with data and set up > similar > > > retention/clean up policies and they seem to be deleting fine.¬ > > > ¬ > > > What adds to the mystery is here is our output of the segment files of > > one > > > of the brokers-0 in this example.¬ > > > > > > $ ls -lah¬ > > > total 642M¬ > > > drwxr-xr-x 2 root root 4.0K May 15 17:28 .¬ > > > drwxr-xr-x 281 root root 12K May 21 12:38 ..¬ > > > -rw-r--r-- 1 root root 16K May 9 08:03 00000000000000000000.index¬ > > > -rw-r--r-- 1 root root 277M May 9 08:03 00000000000000000000.log¬ > > > -rw-r--r-- 1 root root 23K May 9 08:03 > > 00000000000000000000.timeindex¬ > > > -rw-r--r-- 1 root root 63K May 14 22:12 00000000000000507838.index¬ > > > -rw-r--r-- 1 root root 131M May 14 22:00 00000000000000507838.log¬ > > > -rw-r--r-- 1 root root 10 May 9 08:03 > 00000000000000507838.snapshot¬ > > > -rw-r--r-- 1 root root 94K May 14 22:12 > > 00000000000000507838.timeindex¬ > > > -rw-r--r-- 1 root root 10M May 21 12:38 00000000000000518083.index¬ > > > -rw-r--r-- 1 root root 170M May 21 12:38 00000000000000518083.log¬ > > > -rw-r--r-- 1 root root 10 May 14 22:12 > 00000000000000518083.snapshot¬ > > > -rw-r--r-- 1 root root 10M May 21 12:38 > > 00000000000000518083.timeindex¬ > > > -rw-r--r-- 1 root root 383 May 14 10:34 leader-epoch-checkpoint¬ > > > ¬ > > > We see the segment files are relatively small and there are files in > > there > > > since May 9, 2020, it's currently May, 21 2020 as I compose this > email.¬ > > > ¬ > > > Inspecting the container logs themselves for this one particular topic > > and > > > see this:¬ > > > ¬ > > > {"log":"[2020-04-25 12:04:25,685] INFO Deleted log > > > /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000000000.log.deleted. > > > > > > (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.685214246Z"}¬ > > > {"log":"[2020-04-25 12:04:25,685] INFO Deleted offset index > > > > /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000000000.index.deleted. > > > > > > (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.685370948Z"}¬ > > > {"log":"[2020-04-25 12:04:25,685] INFO Deleted time index > > > > > > /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000000000.timeindex.deleted. > > > > > > (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.68563615Z"}¬ > > > {"log":"[2020-04-25 12:04:25,686] INFO Deleted log > > > /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000460418.log.deleted. > > > > > > (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.686287357Z"}¬ > > > {"log":"[2020-04-25 12:04:25,686] INFO Deleted offset index > > > > /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000460418.index.deleted. > > > > > > (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.686376758Z"}¬ > > > {"log":"[2020-04-25 12:04:25,686] INFO Deleted time index > > > > > > /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000460418.timeindex.deleted. > > > > > > (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.686436058Z"}¬ > > > {"log":"[2020-04-25 12:04:25,688] INFO Deleted log > > > /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000471726.log.deleted. > > > > > > (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.688237176Z"}¬ > > > {"log":"[2020-04-25 12:04:25,688] INFO Deleted offset index > > > > /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000471726.index.deleted. > > > > > > (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.688251276Z"}¬ > > > {"log":"[2020-04-25 12:04:25,688] INFO Deleted time index > > > > > > /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000471726.timeindex.deleted. > > > > > > (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.688332877Z"}¬ > > > {"log":"[2020-05-09 12:03:35,810] INFO Cleaner 0: Cleaning segment 0 in > > > log ETD-TEST-0 (largest timestamp Sat Apr 25 11:41:13 UTC 2020) into 0, > > > retaining deletes. > > > > > > (kafka.log.LogCleaner)\n","stream":"stdout","time":"2020-05-09T12:03:35.810676351Z"}¬ > > > {"log":"[2020-05-09 12:03:37,924] INFO Cleaner 0: Cleaning segment > 483496 > > > in log ETD-TEST-0 (largest timestamp Sat May 02 12:00:29 UTC 2020) into > > 0, > > > retaining deletes. > > > > > > (kafka.log.LogCleaner)\n","stream":"stdout","time":"2020-05-09T12:03:37.924653038Z"}¬ > > > {"log":"[2020-05-09 12:03:38,001] INFO Cleaner 0: Cleaning segment > 495708 > > > in log ETD-TEST-0 (largest timestamp Sat May 09 12:03:28 UTC 2020) into > > 0, > > > retaining deletes. > > > > > > (kafka.log.LogCleaner)\n","stream":"stdout","time":"2020-05-09T12:03:38.001662613Z"}¬ > > > {"log":"[2020-05-09 12:04:40,354] INFO Deleted log > > > /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000000000.log.deleted. > > > > > > (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-05-09T12:04:40.355179461Z"}¬ > > > {"log":"[2020-05-09 12:04:40,355] INFO Deleted offset index > > > > /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000000000.index.deleted. > > > > > > (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-05-09T12:04:40.355208261Z"}¬ > > > ¬ > > > What's weird the log cleaner says it's deleting the logs but they don't > > > appear to be? Also what's the deal with the retaining deletes line?¬ > > > ¬ > > > So based on this, can anyone let us know if there's anything funky with > > > our configurations or is this potentially a bug in Kafka?¬ > > > ¬ > > > Thanks for any input! > > > > > > > > > > > -- > Chris >