Which Kafka release are you using ? I was looking at some potentially related JIRA(s), such as KAFKA-6015.
FYI On Wed, Apr 4, 2018 at 3:05 AM, Saheb Motiani <sah...@cakesolutions.net> wrote: > Hi all, > > We have been seeing this issue intermittently, and hence it's difficult to > give a step by step instructions to reproduce it. I have been studying the > code base of the Sender.java > (org.apache.kafka.clients.producer.internals.Sender.java), but haven't > been > able to find the possible bug. > > We are using setup is 3 node Kafka cluster. > > Here are some relevant logs: > > 2018-03-28 09:50:54,290 ERROR [kafka-producer-network-thread | producer-1] > o.a.k.c.producer.internals.Sender:301 - [Producer clientId=producer-1] The > broker returned org.apache.kafka.common.errors.UnknownProducerIdException: > This exception is raised by the broker if it could not locate the producer > metadata associated with the producerId in question. This could happen if, > for instance, the producer's records were deleted because their retention > time had elapsed. Once the last records of the producerId are removed, the > producer's metadata is removed from the broker, and future appends by the > producer will return this exception. for topic-partition pipeline-0 at > offset -1. This indicates data loss on the broker, and should be > investigated. > > 2018-03-28 09:51:13,394 WARN [kafka-producer-network-thread | producer-1] > o.a.k.c.producer.internals.Sender:251 - [Producer clientId=producer-1] Got > error produce response with correlation id 1000 on topic-partition > pipeline-3, retrying (2147483459 attempts left). Error: > OUT_OF_ORDER_SEQUENCE_NUMBER > > 2018-03-28 10:48:33,365 WARN [kafka-producer-network-thread | producer-1] > o.a.k.c.producer.internals.Sender:251 - [Producer clientId=producer-1] Got > error produce response with correlation id 34893 on topic-partition > pipeline-3, retrying (2147449585 attempts left). Error: > OUT_OF_ORDER_SEQUENCE_NUMBER > > [2018-03-28 09:50:54,421] ERROR [ReplicaManager broker=1001] Error > processing append operation on partition pipeline-3 > (kafka.server.ReplicaManager) > org.apache.kafka.common.errors.OutOfOrderSequenceException: Out of order > sequence number for producerId 5102: 2 (incoming seq. number), 7 (current > end sequence number) > > > 1. We have some sort of Admin API, which deletes and recreates topics (and > loads them), and when we delete a topic it creates a new producerId, which > uses the same producer instance to write messages. (This might be a > problem, but we don't know for sure) > > 2. We don't always get stuck in this INT_MAX retries (because we have > enabled idempotence), many times it stops after 30 seconds, as expected and > sets a new producerId. (But sometimes that timeout exception doesn't get > triggered) > > 2018-03-29 10:16:54,826 INFO [kafka-producer-network-thread | producer-1] > o.a.k.c.p.i.TransactionManager:346 - [Producer clientId=producer-1] > ProducerId set to -1 with epoch -1 > 2018-03-29 10:16:54,827 INFO [kafka-producer-network-thread | producer-1] > o.a.k.c.p.i.TransactionManager:346 - [Producer clientId=producer-1] > ProducerId set to 9002 with epoch 0 > > --- > We are looking to eliminate this indeterministic behaviour, by > handling the OUT_OF_ORDER_SEQUENCE_NUMBER > in a better way (maybe re-instantiate the producer, but not sure if that > would solve anything as Kafka has ways to reset producerId after timeout). > > Any ideas/comments on why this is happening, regardless of having a default > timeout of 30 seconds? > > Please let me know if you need more information in understanding the > problem we are facing. > > Regards, > Saheb > -- > ... > [image: cake bamtech_logo_rgb signature.jpg] <http://www.cakesolutions.net > > > > Saheb Motiani > (Office) 0845 617 1200 > Houldsworth Mill, Houldsworth Street, Reddish, Stockport, SK5 6DA, UK > www.cakesolutions.net > [image: twitter-circle-darkgrey.png] > <https://twitter.com/cakesolutions> [image: > facebook-circle-darkgrey.png] > <https://www.facebook.com/cakesolutionslimited/> [image: > linkedin-circle-darkgrey.png] > <https://www.linkedin.com/company/cake-solutions-limited> > [image: Reactive Applications] > <https://cakesolutions.sigstr.net/uc/588780e60e0f7519396890f3> > Company registered in the UK, No. 4184567 If you have received this e-mail > in error, please accept our apologies, destroy it immediately, and it would > be greatly appreciated if you notified the sender. It is your > responsibility to protect your system from viruses and any other harmful > code or device. We try to eliminate them from e-mails and attachments, but > we accept no liability for any which remain. We may monitor or access any > or all e-mails sent to us. > [image: Powered by Sigstr] > <https://cakesolutions.sigstr.net/uc/588780e60e0f7519396890f3/watermark> >