https://issues.apache.org/jira/browse/KAFKA-4669?focusedCommentId=16271727&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16271727 Can we reopen this issue ?
We saw this on the consumer in production today. We are on 0.11.01 ERROR c.v.v.h.m.k.KafkaEventConsumerDelegate- Error fetching next record java.lang.Exception: Error fetching next new record from kafka queue at com.vmware.vchs.hybridity.messaging.kafka.KafkaEventConsumerDelegate.getNextEventWithTimeout(KafkaEventConsumerDelegate.java:121) at com.vmware.vchs.hybridity.messaging.kafka.KafkaEventConsumerDelegate.getNextEvent(KafkaEventConsumerDelegate.java:64) at com.vmware.vchs.hybridity.messaging.adapter.EventListenerAdapter.getNextEvent(EventListenerAdapter.java:76) at com.vmware.vchs.hybridity.messaging.adapter.EventListenerAdapter.getNextEventAndAck(EventListenerAdapter.java:94) at com.vmware.vchs.hybridity.messaging.adapter.EventListenerAdapter$1.run(EventListenerAdapter.java:125) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.IllegalStateException: Correlation id for response (386681) does not match request (386680), request header: {api_key=9,api_version=3,correlation_id=386680,client_id=consumer-36} at org.apache.kafka.clients.NetworkClient.correlate(NetworkClient.java:752) at org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:561) at org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:657) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:442) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:232) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:208) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:168) at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.fetchCommittedOffsets(ConsumerCoordinator.java:477) at org.apache.kafka.clients.consumer.KafkaConsumer.committed(KafkaConsumer.java:1346) at com.vmware.vchs.hybridity.messaging.kafka.KafkaEventConsumerDelegate.getNextEventWithTimeout(KafkaEventConsumerDelegate.java:86) ... 5 common frames omitted On Mon, Mar 6, 2017 at 9:44 AM, Ismael Juma <ism...@juma.me.uk> wrote: > Hi Mickael, > > This looks to be the same as KAFKA-4669. In theory, this should never > happen and it's unclear when/how it can happen. Not sure if someone has > investigated it in more detail. > > Ismael > > On Mon, Mar 6, 2017 at 5:15 PM, Mickael Maison <mickael.mai...@gmail.com> > wrote: > > > Hi, > > > > In one of our clusters, some of our clients occasionally see this > > exception: > > java.lang.IllegalStateException: Correlation id for response (4564) > > does not match request (4562) > > at org.apache.kafka.clients.NetworkClient.correlate( > > NetworkClient.java:486) > > at org.apache.kafka.clients.NetworkClient.parseResponse( > > NetworkClient.java:381) > > at org.apache.kafka.clients.NetworkClient.handleCompletedReceives( > > NetworkClient.java:449) > > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:269) > > at org.apache.kafka.clients.producer.internals.Sender.run( > Sender.java:229) > > at org.apache.kafka.clients.producer.internals.Sender.run( > Sender.java:134) > > at java.lang.Thread.run(Unknown Source) > > > > We've also seen it from consumer poll() and commit() > > > > Usually the response's correlation id is off by just 1 or 2 (like > > above) but we've also seen it off by a few hundreds: > > java.lang.IllegalStateException: Correlation id for response (742) > > does not match request (174) > > at org.apache.kafka.clients.NetworkClient.correlate( > > NetworkClient.java:486) > > at org.apache.kafka.clients.NetworkClient.parseResponse( > > NetworkClient.java:381) > > at org.apache.kafka.clients.NetworkClient.handleCompletedReceives( > > NetworkClient.java:449) > > at org.apache.kafka.clients.NetworkClient.poll( > NetworkClient.java:269) > > at org.apache.kafka.clients.consumer.internals. > ConsumerNetworkClient. > > clientPoll(ConsumerNetworkClient.java:360) > > at org.apache.kafka.clients.consumer.internals. > > ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) > > at org.apache.kafka.clients.consumer.internals. > > ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192) > > at org.apache.kafka.clients.consumer.internals. > > ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163) > > at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator. > > commitOffsetsSync(ConsumerCoordinator.java:426) > > at org.apache.kafka.clients.consumer.KafkaConsumer. > > commitSync(KafkaConsumer.java:1059) > > at org.apache.kafka.clients.consumer.KafkaConsumer. > > commitSync(KafkaConsumer.java:1027) > > > > When this happens, all subsequent responses are also shifted: > > java.lang.IllegalStateException: Correlation id for response (743) > > does not match request (742) > > java.lang.IllegalStateException: Correlation id for response (744) > > does not match request (743) > > java.lang.IllegalStateException: Correlation id for response (745) > > does not match request (744) > > java.lang.IllegalStateException: Correlation id for response (746) > > does not match request (745) > > ... > > It's easy to discard and recreate the consumer instance to recover > > however we can't do that with the producer as it occurs in the Sender > > thread. > > > > Our cluster and our clients are running Kafka 0.10.0.1. > > Under which circumstances would such an error happen ? > > Even with logging set to TRACE, we can't spot anything suspicious > > shortly before the issue. Is there any data we should try to capture > > when this happens ? > > > > Thanks! > > >