Hi Liam, Thanks for getting back. 1) Producer settings ( I am guessing these are the ones you are interested in) enable.idempotence=true max.in.flight.requests.per.connection=5
2) Sample broker logs corresponding to the timestamp in the application logs of the Producer [2022-04-03 15:56:39,587] ERROR [ReplicaManager broker=5] Error processing append operation on partition input-topic-114 (kafka.server.ReplicaManager) org.apache.kafka.common.errors.OutOfOrderSequenceException: Invalid sequence number for new epoch at offset 967756 in partition input-topic-114: 158 (request epoch), 3 (seq. number) Do the producer errors indicate that these messages never made it to the Kafka topic at all ? Regards, Neeraj On Monday, 4 April, 2022, 12:23:30 pm GMT+10, Liam Clarke-Hutchinson <lclar...@redhat.com> wrote: Hi Neeraj, First off, what are your producer settings? Secondly, do you have brokers logs for the leaders of some of your affected topics on hand at all? Cheers, Liam Clarke-Hutchinson On Mon, 4 Apr 2022 at 14:04, Neeraj Vaidya <neeraj.vai...@yahoo.co.in.invalid> wrote: > Hi All, > For one of the Kafka producers that I have, I see that the Producer Record > Error rate is non-zero i.e. out of the expected 3000 messages per second > which I a expect to be producing to the topic, I can see that this metric > shows a rate of about 200. > Does this indicate that the records failed to be sent to the Kafka topic, > or does this metric show up even for each retry in the Producer.Send > operation ? > > Notes : > 1) I have distributed 8 brokers equally across 2 sites. Using > rack-awareness, I am making Kafka position replicas equally across both > sites. My min.isr=2 and replication factor = 4. This makes 2 replicas to be > located in each site. > 2) The scenario I am testing is that of shutting down a set of 4 brokers > in one site (out of 8) for an extended period of time and then bringing > them back up after say 2 hours. This causes the the follower replicas on > those brokers to try and catch-up with the leader replicas on the other > brokers. The error rate that I am referring to shows up under this scenario > of restarting the brokers. It does not show up when I have just the other > set of (4) brokers. > > To be specific, here are the errors that I see in the Kafka producer log > file: > > 2022-04-03 15:56:39.613 WARN --- [-thread | producer-1] > o.a.k.c.p.i.Sender : [Producer clientId=producer-1] > Got error produce response with correlation id 16512434 on topic-partition > input-topic-114, retrying (2147483646 attempts left). Error: > OUT_OF_ORDER_SEQUENCE_NUMBER > 2022-04-03 15:56:39.613 WARN --- [-thread | producer-1] > o.a.k.c.p.i.Sender : [Producer clientId=producer-1] > Got error produce response with correlation id 16512434 on topic-partition > input-topic-58, retrying (2147483646 attempts left). Error: > OUT_OF_ORDER_SEQUENCE_NUMBER > 2022-04-03 15:56:39.613 INFO --- [-thread | producer-1] > o.a.k.c.p.i.TransactionManager : [Producer clientId=producer-1] > ProducerId set to 2040 with epoch 159 > 2022-04-03 15:56:39.613 INFO --- [-thread | producer-1] > o.a.k.c.p.i.ProducerBatch : Resetting sequence number of > batch with current sequence 3 for partition input-topic-114 to 0 > 2022-04-03 15:56:39.613 INFO --- [-thread | producer-1] > o.a.k.c.p.i.ProducerBatch : Resetting sequence number of > batch with current sequence 5 for partition input-topic-114 to 2 > 2022-04-03 15:56:39.613 INFO --- [-thread | producer-1] > o.a.k.c.p.i.ProducerBatch : Resetting sequence number of > batch with current sequence 6 for partition input-topic-114 to 3 > 2022-04-03 15:56:39.613 INFO --- [-thread | producer-1] > o.a.k.c.p.i.ProducerBatch : Resetting sequence number of > batch with current sequence 1 for partition input-topic-58 to 0 > 2022-04-03 15:56:39.739 WARN --- [-thread | producer-1] > o.a.k.c.p.i.Sender : [Producer clientId=producer-1] > Got error produce response with correlation id 16512436 on topic-partition > input-topic-82, retrying (2147483646 attempts left). Error: > OUT_OF_ORDER_SEQUENCE_NUMBER > > Regards, > Neeraj > >