In production-grade clusters downtime of a single broker shouldn't prevent
consumers from reading messages and catch up with the offset. What
replication factor are you using?

On Wed, Nov 12, 2025 at 10:44 AM Prateek Kohli
<[email protected]> wrote:

> Hi,
>
> I am looking for a reliable, production-safe strategy to avoid losing
> unread messages when a Kafka broker remains down longer than the topic's
> configured retention.ms.
>
> Since Kafka deletes segments purely based on timestamps, if a broker is
> down for (for example) 24 hours and the topic's retention.ms is also 24
> hours, the broker may start deleting segments immediately on startup, even
> if no consumers have read those messages yet.
>
> Is there a recommended way to prevent message loss in this scenario?
>
> I am running Kafka on Kubernetes using Strimzi, so all topic
> configurations are managed through KafkaTopic CRDs and the Topic Operator.
>
> One solution could to be alter the topic's retention configuration. But
> for that to work I would need to ensure that its triggered before Kafka
> delete the log segments. So could something be done during startup?
>
> For example, with a 3-broker cluster, I could prevent the brokers from
> fully starting after the first pod comes up, update the retention values in
> the Strimzi Kafka CR, and then let the operator complete the rollout so the
> cluster restarts with the new retention. Is this safe, or is there a better
> recommended approach to ensure that unread messages are preserved after
> long broker downtime?
>
> Regards,
> Prateek Kohli
>
>

Reply via email to