Hi All, For one of the Kafka producers that I have, I see that the Producer Record Error rate is non-zero i.e. out of the expected 3000 messages per second which I a expect to be producing to the topic, I can see that this metric shows a rate of about 200. Does this indicate that the records failed to be sent to the Kafka topic, or does this metric show up even for each retry in the Producer.Send operation ?
Notes : 1) I have distributed 8 brokers equally across 2 sites. Using rack-awareness, I am making Kafka position replicas equally across both sites. My min.isr=2 and replication factor = 4. This makes 2 replicas to be located in each site. 2) The scenario I am testing is that of shutting down a set of 4 brokers in one site (out of 8) for an extended period of time and then bringing them back up after say 2 hours. This causes the the follower replicas on those brokers to try and catch-up with the leader replicas on the other brokers. The error rate that I am referring to shows up under this scenario of restarting the brokers. It does not show up when I have just the other set of (4) brokers. To be specific, here are the errors that I see in the Kafka producer log file: 2022-04-03 15:56:39.613 WARN --- [-thread | producer-1] o.a.k.c.p.i.Sender : [Producer clientId=producer-1] Got error produce response with correlation id 16512434 on topic-partition input-topic-114, retrying (2147483646 attempts left). Error: OUT_OF_ORDER_SEQUENCE_NUMBER 2022-04-03 15:56:39.613 WARN --- [-thread | producer-1] o.a.k.c.p.i.Sender : [Producer clientId=producer-1] Got error produce response with correlation id 16512434 on topic-partition input-topic-58, retrying (2147483646 attempts left). Error: OUT_OF_ORDER_SEQUENCE_NUMBER 2022-04-03 15:56:39.613 INFO --- [-thread | producer-1] o.a.k.c.p.i.TransactionManager : [Producer clientId=producer-1] ProducerId set to 2040 with epoch 159 2022-04-03 15:56:39.613 INFO --- [-thread | producer-1] o.a.k.c.p.i.ProducerBatch : Resetting sequence number of batch with current sequence 3 for partition input-topic-114 to 0 2022-04-03 15:56:39.613 INFO --- [-thread | producer-1] o.a.k.c.p.i.ProducerBatch : Resetting sequence number of batch with current sequence 5 for partition input-topic-114 to 2 2022-04-03 15:56:39.613 INFO --- [-thread | producer-1] o.a.k.c.p.i.ProducerBatch : Resetting sequence number of batch with current sequence 6 for partition input-topic-114 to 3 2022-04-03 15:56:39.613 INFO --- [-thread | producer-1] o.a.k.c.p.i.ProducerBatch : Resetting sequence number of batch with current sequence 1 for partition input-topic-58 to 0 2022-04-03 15:56:39.739 WARN --- [-thread | producer-1] o.a.k.c.p.i.Sender : [Producer clientId=producer-1] Got error produce response with correlation id 16512436 on topic-partition input-topic-82, retrying (2147483646 attempts left). Error: OUT_OF_ORDER_SEQUENCE_NUMBER Regards, Neeraj