Re: Kafka retention policy per topic not working as expected

Don Ky Wed, 27 May 2020 08:07:14 -0700

Thanks Christopher,

Hrm, I checked the logs and retention.ms has been enabled somewhere around
October last year.


{"log":"[2019-10-20 03:01:41,818] INFO Processing override for entityPath:
topics/ETD-TEST with config: Map(retention.ms -\u003e 1210000000)
(kafka.server.DynamicConfigManager)\n","stream":"stdout","time":"2019-10-20T03:01:41.818402555Z"}

So don't think the first hypothesis of it being turned on recently sticks.

I'll have to investigate the other possibility you mentioned but we only
have 46 topics with roughly 4 million events total.

I'll continue to dig but thank you for your thoughts and chiming in!




On Sun, May 24, 2020 at 8:10 PM Christopher Smith <cbsm...@gmail.com> wrote:

> It's a bit weird having the default policy for your brokers being compact,
> but yes, the policy for the topic overrides the broker policy.
>
> What you are seeing on with your example of the segment files makes sense
> to me. With a  1210000000ms retention policy you've actually got 400
> seconds beyond 14 days. The default 7 day segment rotation window, along
> with a low data rate to ensure you don't hit segment.bytes limits, ensures
> that after 21 days, you'd have three segments, each of which would still
> potentially have data within the retention window, so there would be no
> deletion. So seeing May 9th data on May 21st seems reasonable to me.
>
> As to why you have messages going back 8 months on a topic with
> retention.ms=1210000000,
> I can only theorize. One case I could imagine would be if you had the
> cleanup policy initially as compact, then it might have been doing
> compaction for eight+ months. If so, you might have records in compacted &
> merged segments that go back to the ends of time. So, if switched to a
> "delete" policy recently, as long as those merged segments have even one
> message in them from < 1210000000 ms ago, the delete policy will still not
> consider it eligible for deletion.
>
> The other possibility is that the cleaner just hasn't gotten around to
> deleting those segments. Given that you have "compact" as your default
> policy, you might have a lot of topics that are compacted and it's busy
> churning through those.
>
> --Chris
>
> On Sun, May 24, 2020 at 2:46 PM Don Ky <asandy...@gmail.com> wrote:
>
> > Hello all, any thoughts on this? Just pinging if anyone out there has any
> > pointers on where to go from here? I was wondering if it could be that
> the
> > topic is actually being compacted and not a retention policy of delete?
> > Could this be because that the topics were initially created with a
> compact
> > policy and now the delete policy isn't applicable?
> >
> > Thanks!
> >
> > On Thu, May 21, 2020 at 12:49 PM Don Ky <asandy...@gmail.com> wrote:
> >
> > > Hello Kafka users! I'm posting because we are having an issue with a
> > topic
> > > retention that we don't understand and need input on what could
> > potentially
> > > be wrong on our end if it's indeed is a configuration issue.¬
> > > First we are running a containerized version of Kafka with the
> following
> > > environment variables.¬
> > > ¬
> > > CONFLUENT_MINOR_VERSION=3¬
> > > KAFKA_LOG_CLEANUP_POLICY=compact¬
> > > KAFKA_LOG4J_LOGGERS=kafka.controller=ERROR¬
> > > CONFLUENT_DEB_VERSION=1¬
> > > KAFKA_VERSION=5.3.1¬
> > > CONFLUENT_PATCH_VERSION=1¬
> > > CONFLUENT_VERSION=5.3.1¬
> > > KAFKA_LOG_RETENTION_BYTES=-1¬
> > > PYTHON_VERSION=2.7.9-1¬
> > > SCALA_VERSION=2.12¬
> > > CONFLUENT_MAJOR_VERSION=5¬
> > > KAFKA_NUM_PARTITIONS=4¬
> > > ZULU_OPENJDK_VERSION=8=8.38.0.13¬
> > > KAFKA_DEFAULT_REPLICATION_FACTOR=6¬
> > > KAFKA_LOG_RETENTION_HOURS=2147483647¬
> > > ¬
> > > I'm listing these vars because they might be helpful for someone with a
> > > keen eye to spot something we are missing. So based on this we should
> > have
> > > a global log clean up/retention of:¬
> > > ¬
> > > KAFKA_LOG_CLEANUP_POLICY of compact~¬
> > > KAFKA_LOG_RETENTION_HOURS=2147483647 = 214 years~¬
> > > KAFKA_LOG_RETENTION_BYTES=-1~¬
> > > ¬
> > > We currently have a topic called "ETD-TEST" shown below:¬
> > > ¬
> > > kafka-topics --describe --zookeeper $PRD:2181 --topic ETD-TEST¬
> > > Topic: ETD-TEST▸--PartitionCount: 4▸ReplicationFactor: 6▸Configs:
> > > cleanup.policy=delete,retention.ms=1210000000¬
> > > ▸--Topic: ETD-TEST▸--Partition: 0▸--Leader: 1▸--Replicas:
> > > 1,4,5,2,3,6▸--Isr: 5,1,6,2,3,4¬
> > > ▸--Topic: ETD-TEST▸--Partition: 1▸--Leader: 1▸--Replicas:
> > > 1,2,3,6,5,4▸--Isr: 5,1,6,2,3,4¬
> > > ▸--Topic: ETD-TEST▸--Partition: 2▸--Leader: 4▸--Replicas:
> > > 4,5,1,6,2,3▸--Isr: 5,1,6,2,3,4¬
> > > ▸--Topic: ETD-TEST▸--Partition: 3▸--Leader: 1▸--Replicas:
> > > 1,3,6,4,2,5▸--Isr: 5,1,6,2,3,4¬
> > > ¬
> > > Notice it has a clean up policy of delete which should override the
> > global
> > > setting of compact?¬
> > > ¬
> > > The issue is ETD-TEST has a topic level retention.ms value of
> 1210000000
> > > (14 days) but we have messages further in the topic dating much further
> > > back. Somewhere around 8 months. We only ran into this and were
> surprised
> > > when we started consuming the messages and saw close to 2 million
> > > messages.~¬
> > > ¬
> > > From the reading and research I've done in the last few days
> > retention.ms
> > > isn't necessarily how old messages/events in the topic should be but a
> > > value of how long the segment files in the topic are kept around before
> > > getting rotated out. (Please correct if this is false)¬
> > > ¬
> > > I learned that the segment files rotated also have a size + duration
> > > value. The defaults being 1G for segment size (if a segment reaches
> this
> > > size it gets rotated similar to logs. The segment.ms determines how
> old
> > a
> > > message should be also before it gets rotated the default being 7 days.
> > So
> > > if a log segment never reaches size it will eventually hit the
> > segment.ms
> > > of 7 days and be rotated.¬
> > > ¬
> > > The weird thing is this seems to only affect this one particular topic,
> > > I've created a separate topic and flooded it with data and set up
> similar
> > > retention/clean up policies and they seem to be deleting fine.¬
> > > ¬
> > > What adds to the mystery is here is our output of the segment files of
> > one
> > > of the brokers-0 in this example.¬
> > >
> > > $ ls -lah¬
> > > total 642M¬
> > > drwxr-xr-x   2 root root 4.0K May 15 17:28 .¬
> > > drwxr-xr-x 281 root root  12K May 21 12:38 ..¬
> > > -rw-r--r--   1 root root  16K May  9 08:03 00000000000000000000.index¬
> > > -rw-r--r--   1 root root 277M May  9 08:03 00000000000000000000.log¬
> > > -rw-r--r--   1 root root  23K May  9 08:03
> > 00000000000000000000.timeindex¬
> > > -rw-r--r--   1 root root  63K May 14 22:12 00000000000000507838.index¬
> > > -rw-r--r--   1 root root 131M May 14 22:00 00000000000000507838.log¬
> > > -rw-r--r--   1 root root   10 May  9 08:03
> 00000000000000507838.snapshot¬
> > > -rw-r--r--   1 root root  94K May 14 22:12
> > 00000000000000507838.timeindex¬
> > > -rw-r--r--   1 root root  10M May 21 12:38 00000000000000518083.index¬
> > > -rw-r--r--   1 root root 170M May 21 12:38 00000000000000518083.log¬
> > > -rw-r--r--   1 root root   10 May 14 22:12
> 00000000000000518083.snapshot¬
> > > -rw-r--r--   1 root root  10M May 21 12:38
> > 00000000000000518083.timeindex¬
> > > -rw-r--r--   1 root root  383 May 14 10:34 leader-epoch-checkpoint¬
> > > ¬
> > > We see the segment files are relatively small and there are files in
> > there
> > > since May 9, 2020, it's currently May, 21 2020 as I compose this
> email.¬
> > > ¬
> > > Inspecting the container logs themselves for this one particular topic
> > and
> > > see this:¬
> > > ¬
> > > {"log":"[2020-04-25 12:04:25,685] INFO Deleted log
> > > /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000000000.log.deleted.
> > >
> >
> (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.685214246Z"}¬
> > > {"log":"[2020-04-25 12:04:25,685] INFO Deleted offset index
> > >
> /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000000000.index.deleted.
> > >
> >
> (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.685370948Z"}¬
> > > {"log":"[2020-04-25 12:04:25,685] INFO Deleted time index
> > >
> >
> /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000000000.timeindex.deleted.
> > >
> >
> (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.68563615Z"}¬
> > > {"log":"[2020-04-25 12:04:25,686] INFO Deleted log
> > > /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000460418.log.deleted.
> > >
> >
> (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.686287357Z"}¬
> > > {"log":"[2020-04-25 12:04:25,686] INFO Deleted offset index
> > >
> /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000460418.index.deleted.
> > >
> >
> (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.686376758Z"}¬
> > > {"log":"[2020-04-25 12:04:25,686] INFO Deleted time index
> > >
> >
> /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000460418.timeindex.deleted.
> > >
> >
> (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.686436058Z"}¬
> > > {"log":"[2020-04-25 12:04:25,688] INFO Deleted log
> > > /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000471726.log.deleted.
> > >
> >
> (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.688237176Z"}¬
> > > {"log":"[2020-04-25 12:04:25,688] INFO Deleted offset index
> > >
> /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000471726.index.deleted.
> > >
> >
> (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.688251276Z"}¬
> > > {"log":"[2020-04-25 12:04:25,688] INFO Deleted time index
> > >
> >
> /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000471726.timeindex.deleted.
> > >
> >
> (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-04-25T12:04:25.688332877Z"}¬
> > > {"log":"[2020-05-09 12:03:35,810] INFO Cleaner 0: Cleaning segment 0 in
> > > log ETD-TEST-0 (largest timestamp Sat Apr 25 11:41:13 UTC 2020) into 0,
> > > retaining deletes.
> > >
> >
> (kafka.log.LogCleaner)\n","stream":"stdout","time":"2020-05-09T12:03:35.810676351Z"}¬
> > > {"log":"[2020-05-09 12:03:37,924] INFO Cleaner 0: Cleaning segment
> 483496
> > > in log ETD-TEST-0 (largest timestamp Sat May 02 12:00:29 UTC 2020) into
> > 0,
> > > retaining deletes.
> > >
> >
> (kafka.log.LogCleaner)\n","stream":"stdout","time":"2020-05-09T12:03:37.924653038Z"}¬
> > > {"log":"[2020-05-09 12:03:38,001] INFO Cleaner 0: Cleaning segment
> 495708
> > > in log ETD-TEST-0 (largest timestamp Sat May 09 12:03:28 UTC 2020) into
> > 0,
> > > retaining deletes.
> > >
> >
> (kafka.log.LogCleaner)\n","stream":"stdout","time":"2020-05-09T12:03:38.001662613Z"}¬
> > > {"log":"[2020-05-09 12:04:40,354] INFO Deleted log
> > > /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000000000.log.deleted.
> > >
> >
> (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-05-09T12:04:40.355179461Z"}¬
> > > {"log":"[2020-05-09 12:04:40,355] INFO Deleted offset index
> > >
> /usr/data/kafka/kafka-logs/ETD-TEST-0/00000000000000000000.index.deleted.
> > >
> >
> (kafka.log.LogSegment)\n","stream":"stdout","time":"2020-05-09T12:04:40.355208261Z"}¬
> > > ¬
> > > What's weird the log cleaner says it's deleting the logs but they don't
> > > appear to be? Also what's the deal with the retaining deletes line?¬
> > > ¬
> > > So based on this, can anyone let us know if there's anything funky with
> > > our configurations or is this potentially a bug in Kafka?¬
> > > ¬
> > > Thanks for any input!
> > >
> > >
> >
>
>
> --
> Chris
>

Re: Kafka retention policy per topic not working as expected

Reply via email to