Hello,

Yesterday, I had to replace a faulty Kafka broker node, and the method of
replacement involved bringing up a blank replacement using the old broker's
ID, thus triggering a replication of all its old partitions.

Today I was dealing with disk usage alerts for only that broker: it turned
out that the broker was not deleting old logs like the rest of the nodes.

I haven't checked the code, but eventually I came to the conclusion that
Kafka log file deletion is based on file create or modified time, rather
than the max produce time of the messages within the log file itself.

This makes the method I use of replacing a faulty node with a blank slate
problematic, since five day old messages will be stored in a file with a
recent c/mtime, thus won't be deleted and will soon cause disk space
exhaustion.

My temporary workaround was to reduce retention of the largest topic to 24
hours but I'd prefer not doing that since it's more manual work and it
breaks my SLA.

Can this behaviour of Kafka be changed via configs at all?

Has anyone faced a similar problem and have suggestions?

Thanks,
Gwilym

Reply via email to