[ https://issues.apache.org/jira/browse/KAFKA-18216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17946823#comment-17946823 ]
Lianet Magrans commented on KAFKA-18216: ---------------------------------------- Hey [~frankvicky] ! I don't have much context on how exactly this was reprod with the command line tool (just what's on this jira), but I do know that the actual symptom was seeing spikes in the consumer lag reported on this metric [https://github.com/apache/kafka/blob/8b4560e3f0f8e6cc16fe9c4c6eac95d6ae9b7c51/clients/src/main/java/org/apache/kafka/clients/consumer/internals/FetchCollector.java#L189] So we could start by trying to monitor that metric when running the load to see if we can reprod the spikes only in the new consumer? Initially the investigation led to think that the cause for the ups and down was that the cadence at which the HWM/LSO were updated was different in the new consumer, and the logs that would support that I imagine would be the ones you mentioned (that come from here [https://github.com/apache/kafka/blob/8b4560e3f0f8e6cc16fe9c4c6eac95d6ae9b7c51/clients/src/main/java/org/apache/kafka/clients/consumer/internals/FetchCollector.java#L280-L288] ), and also these 2: [https://github.com/apache/kafka/blob/8b4560e3f0f8e6cc16fe9c4c6eac95d6ae9b7c51/clients/src/main/java/org/apache/kafka/clients/consumer/internals/OffsetFetcherUtils.java#L295-L298] Hope it helps! I haven't reprod this myself either, but can definitely jump in to help as soon as I get some bandwidth. Let's stay in touch. Thanks! > High water mark or last stable offset aren't always updated after a fetch > request is completed > ---------------------------------------------------------------------------------------------- > > Key: KAFKA-18216 > URL: https://issues.apache.org/jira/browse/KAFKA-18216 > Project: Kafka > Issue Type: Improvement > Components: clients, consumer > Reporter: Philip Nee > Assignee: TengYao Chi > Priority: Minor > Labels: consumer-threading-refactor > Fix For: 4.1.0 > > > We've noticed AsyncKafkaConsumer doesn't always update the high water > mark/LSO followed by handling a successful fetch response. And we know > consumer lag metrics is calculated by HWM/LSO - current fetched position. We > are suspecting this could have a subtle effect into how consumer lag is > recorded, which might have a slight impact into the accuracy of client > metrics reporting. > The consumer records consumer lag when reading the fetched record > The consumer updates the HWM/LSO when the background thread completes the > fetched request. > In the original implementation, the fetcher consistently updates the HWM/LSO > after handling the completed fetch request. > In the new implementation, due to the async threading model, we can't > guarantee the sequence of the event. > This defect is affecting neither performance nor correctness and is therefore > marked as "Minor" > > This can be easily reproduced using the java-produce-consumer-demo.sh > example. Ensure to produce enough records (I use 200000000 records, less is > fine as well). Custom logging is required. -- This message was sent by Atlassian Jira (v8.20.10#820010)