In most use cases, Kafka serves as a messaging middleware where messages that have already been consumed are typically no longer needed and can be safely deleted. Therefore, I propose enhancing the threshold strategy with an automatic deletion feature:
When a broker's disk usage reaches 95%, it should automatically delete the oldest 10% of messages on the node to free up disk space, allowing new messages to be produced. This eliminates the need for manual cleanup while ensuring that new messages (which are almost always more critical than already-consumed data) take priority. Prevents disk-full scenarios by automatically removing stale data. No admin intervention required for basic cleanup. Fresh messages are never blocked by obsolete ones. The only potential risk arises if consumer groups experience significant lag where unconsumed messages might be deleted prematurely. However, in such cases, the root issue is the backlog itself—teams should prioritize resolving the lag rather than relying on retention. To accommodate different needs, we could introduce a `disk.threshold.policy` parameter, allowing users to choose between: 1. Rejecting new messages 2. Auto deleting the oldest messages Best regards mapan <mapan0...@gmail.com> 于 2025年7月31日周四 下午8:18写道: > Hi all, > > I’d like to start a discussion about a new KIP: > https://cwiki.apache.org/confluence/x/Nw9JFg > > This KIP suggests adding disk threshold configs in Kafka and rejecting new > product > requests after reaching the threshold to prevent disk full failure. > > This strategy is similar to RocketMQ's diskMaxUsedSpaceRatio config or > RabbitMQ's > disk_free_limit config, and I hope to implement this strategy in our > environment. > > Please share your feedback, questions, or concerns so we can refine > the proposal together. > > Best regards, > mapan >