Hi, I am looking for a reliable, production-safe strategy to avoid losing unread messages when a Kafka broker remains down longer than the topic's configured retention.ms.
Since Kafka deletes segments purely based on timestamps, if a broker is down for (for example) 24 hours and the topic's retention.ms is also 24 hours, the broker may start deleting segments immediately on startup, even if no consumers have read those messages yet. Is there a recommended way to prevent message loss in this scenario? I am running Kafka on Kubernetes using Strimzi, so all topic configurations are managed through KafkaTopic CRDs and the Topic Operator. One solution could to be alter the topic's retention configuration. But for that to work I would need to ensure that its triggered before Kafka delete the log segments. So could something be done during startup? For example, with a 3-broker cluster, I could prevent the brokers from fully starting after the first pod comes up, update the retention values in the Strimzi Kafka CR, and then let the operator complete the rollout so the cluster restarts with the new retention. Is this safe, or is there a better recommended approach to ensure that unread messages are preserved after long broker downtime? Regards, Prateek Kohli
