[ https://issues.apache.org/jira/browse/KAFKA-7300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kevin Lu updated KAFKA-7300: ---------------------------- Description: https://cwiki.apache.org/confluence/display/KAFKA/KIP-356%3A+Add+KafkaConsumer+fetch-error-rate+and+fetch-error-total+metrics The KafkaConsumer is a complex client that requires many different components to function properly. When a consumer fails, it can be difficult to identify the root cause and which component failed (ConsumerCoordinator, Fetcher, ConsumerNetworkClient, etc). This aims to improve the monitoring and detection of KafkaConsumer’s Fetcher component. Fetcher will send a fetch request for each node that the consumer has assigned partitions for. This fetch request may fail under the following cases: * Intermittent network issues (goes to onFailure) * Node sent an invalid full/incremental fetch response (FetchSessionHandler’s handleResponse returns false) * FetchSessionIdNotFound * InvalidFetchSessionEpochException These cases are logged, but it would be valuable to provide a corresponding metric that allows for monitoring and alerting. was: The KafkaConsumer is a complex client that requires many different components to function properly. When a consumer fails, it can be difficult to identify the root cause and which component failed (ConsumerCoordinator, Fetcher, ConsumerNetworkClient, etc). This aims to improve the monitoring and detection of KafkaConsumer’s Fetcher component. Fetcher will send a fetch request for each node that the consumer has assigned partitions for. This fetch request may fail under the following cases: * Intermittent network issues (goes to onFailure) * Node sent an invalid full/incremental fetch response (FetchSessionHandler’s handleResponse returns false) * FetchSessionIdNotFound * InvalidFetchSessionEpochException These cases are logged, but it would be valuable to provide a corresponding metric that allows for monitoring and alerting. > Add KafkaConsumer fetch-error-rate and fetch-error-total metrics > ----------------------------------------------------------------- > > Key: KAFKA-7300 > URL: https://issues.apache.org/jira/browse/KAFKA-7300 > Project: Kafka > Issue Type: New Feature > Components: clients, consumer, metrics > Reporter: Kevin Lu > Assignee: Kevin Lu > Priority: Minor > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-356%3A+Add+KafkaConsumer+fetch-error-rate+and+fetch-error-total+metrics > > The KafkaConsumer is a complex client that requires many different components > to function properly. When a consumer fails, it can be difficult to identify > the root cause and which component failed (ConsumerCoordinator, Fetcher, > ConsumerNetworkClient, etc). > > This aims to improve the monitoring and detection of KafkaConsumer’s Fetcher > component. > > Fetcher will send a fetch request for each node that the consumer has > assigned partitions for. > > This fetch request may fail under the following cases: > * Intermittent network issues (goes to onFailure) > * Node sent an invalid full/incremental fetch response > (FetchSessionHandler’s handleResponse returns false) > * FetchSessionIdNotFound > * InvalidFetchSessionEpochException > > These cases are logged, but it would be valuable to provide a corresponding > metric that allows for monitoring and alerting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)