[ https://issues.apache.org/jira/browse/KAFKA-18570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Divij Vaidya updated KAFKA-18570: --------------------------------- Component/s: documentation > Update documemtation for log loading metrics during Kafka broker startup > ------------------------------------------------------------------------- > > Key: KAFKA-18570 > URL: https://issues.apache.org/jira/browse/KAFKA-18570 > Project: Kafka > Issue Type: Improvement > Components: documentation > Affects Versions: 3.3.0 > Reporter: Mehari Beyene > Assignee: Mehari Beyene > Priority: Major > Fix For: 4.1.0 > > > When a Kafka broker process starts up, it goes through the process of > restoring the state of the broker based on the segment files stored on the > disk and other auxiliary checkpoint files used to store the broker's state. > In a clean shutdown scenario, Kafka undergoes a clean shutdown, meaning all > states are persisted on the local disk, and the process of restoring the > broker's state is relatively quick (estimated under 10 minutes for a > partition count of 4000). > However, if the broker experiences an unclean shutdown, the log loading > process will also involve recovering the broker state by replaying messages > and trying to reconstruct the last known safe state of the broker. This > recovery process can take a very long time. Anecdotal data shows we have seen > processes that took more than two hours. > Log recovery is triggered as part of log loading, during this recovery > process, there is no metric that indicates the progress, leaving both Kafka > cluster administrators and customers blind to the state of the recovery. Not > having any metric that operators can use to estimate the ETA is difficult for > planning and managing expectations. > The exit criteria for this issue is to add a metric that would show the > progress of log loading when a broker starts up. -- This message was sent by Atlassian Jira (v8.20.10#820010)