Hi Liam,

I think you've said it well.

David Finnie

Infrasoft Pty Limited

On 4/04/2022 15:10, Liam Clarke-Hutchinson wrote:
Thanks Neeraj,

>From reading code, I am reasonably certain that no data loss occurred - the
producer reset the batch sequence numbers, and then tried again.

I refer you to this comment in the code of the producer's Sender:

                 // tell the user the result of their request. We only
adjust sequence numbers if the batch didn't exhaust
                 // its retries -- if it did, we don't know whether the
sequence number was accepted or not, and
                 // thus it is not safe to reassign the sequence.

So yeah, the fact that you see it tweaking sequence numbers is because the
batch hasn't exhausted its retry limit, so the sequence number is adjusted,
and then sending the batch is tried again.

That said, if you want to be sure, and can replicate this issue, you could
perhaps modify the code and use the producer send method overload that
accepts a callback which advises of success or failure when a batch is
completed successfully or fatally, and then log any exceptions occurring
there.

Disclaimer - I'm unfamiliar with these internals of producers, so my
reading of the code could be wrong, so I strongly welcome correction from
other mailing list members if that's the case.

Cheers,

Liam Clarke


On Mon, 4 Apr 2022 at 16:29, Neeraj Vaidya
<neeraj.vai...@yahoo.co.in.invalid> wrote:

  Hi Liam,
Brokers are on Apache Kafka v2.7.0
However, the Producer client is using the v2.6 libraries.

Regards,
Neeraj     On Monday, 4 April, 2022, 02:17:42 pm GMT+10, Liam
Clarke-Hutchinson <lclar...@redhat.com> wrote:

  Hi Neeraj,

Not sure just yet, I'm diving into the code to find out. Oh, what version
Kafka are you running please?

Cheers,

Liam

On Mon, 4 Apr 2022 at 14:50, Neeraj Vaidya
<neeraj.vai...@yahoo.co.in.invalid> wrote:

  Hi Liam,
Thanks for getting back.

1) Producer settings ( I am guessing these are the ones you are
interested
in)
enable.idempotence=true
max.in.flight.requests.per.connection=5

2) Sample broker logs corresponding to the timestamp in the application
logs of the Producer

[2022-04-03 15:56:39,587] ERROR [ReplicaManager broker=5] Error
processing
append operation on partition input-topic-114
(kafka.server.ReplicaManager)
org.apache.kafka.common.errors.OutOfOrderSequenceException: Invalid
sequence number for new epoch at offset 967756 in partition
input-topic-114: 158 (request epoch), 3 (seq. number)

Do the producer errors indicate that these messages never made it to the
Kafka topic at all ?

Regards,
Neeraj
      On Monday, 4 April, 2022, 12:23:30 pm GMT+10, Liam
Clarke-Hutchinson <
lclar...@redhat.com> wrote:

  Hi Neeraj,

First off, what are your producer settings?
Secondly, do you have brokers logs for the leaders of some of your
affected
topics on hand at all?

Cheers,

Liam Clarke-Hutchinson

On Mon, 4 Apr 2022 at 14:04, Neeraj Vaidya
<neeraj.vai...@yahoo.co.in.invalid> wrote:

Hi All,
For one of the Kafka producers that I have, I see that the Producer
Record
Error rate is non-zero i.e. out of the expected 3000 messages per
second
which I a expect to be producing to the topic, I can see that this
metric
shows a rate of about 200.
Does this indicate that the records failed to be sent to the Kafka
topic,
or does this metric show up even for each retry in the Producer.Send
operation ?

Notes :
1) I have distributed  8 brokers equally across 2 sites. Using
rack-awareness, I am making Kafka position replicas equally across both
sites. My min.isr=2 and replication factor = 4. This makes 2 replicas
to
be
located in each site.
2) The scenario I am testing is that of shutting down a set of 4
brokers
in one site (out of 8) for an extended period of time and then bringing
them back up after say 2 hours. This causes the the follower replicas
on
those brokers to try and catch-up with the leader replicas on the other
brokers. The error rate that I am referring to shows up under this
scenario
of restarting the brokers. It does not show up when I have just the
other
set of (4) brokers.

To be specific, here are the errors that I see in the Kafka producer
log
file:

2022-04-03 15:56:39.613  WARN --- [-thread | producer-1]
o.a.k.c.p.i.Sender                      : [Producer
clientId=producer-1]
Got error produce response with correlation id 16512434 on
topic-partition
input-topic-114, retrying (2147483646 attempts left). Error:
OUT_OF_ORDER_SEQUENCE_NUMBER
2022-04-03 15:56:39.613  WARN --- [-thread | producer-1]
o.a.k.c.p.i.Sender                      : [Producer
clientId=producer-1]
Got error produce response with correlation id 16512434 on
topic-partition
input-topic-58, retrying (2147483646 attempts left). Error:
OUT_OF_ORDER_SEQUENCE_NUMBER
2022-04-03 15:56:39.613  INFO --- [-thread | producer-1]
o.a.k.c.p.i.TransactionManager          : [Producer
clientId=producer-1]
ProducerId set to 2040 with epoch 159
2022-04-03 15:56:39.613  INFO --- [-thread | producer-1]
o.a.k.c.p.i.ProducerBatch                : Resetting sequence number of
batch with current sequence 3 for partition input-topic-114 to 0
2022-04-03 15:56:39.613  INFO --- [-thread | producer-1]
o.a.k.c.p.i.ProducerBatch                : Resetting sequence number of
batch with current sequence 5 for partition input-topic-114 to 2
2022-04-03 15:56:39.613  INFO --- [-thread | producer-1]
o.a.k.c.p.i.ProducerBatch                : Resetting sequence number of
batch with current sequence 6 for partition input-topic-114 to 3
2022-04-03 15:56:39.613  INFO --- [-thread | producer-1]
o.a.k.c.p.i.ProducerBatch                : Resetting sequence number of
batch with current sequence 1 for partition input-topic-58 to 0
2022-04-03 15:56:39.739  WARN --- [-thread | producer-1]
o.a.k.c.p.i.Sender                      : [Producer
clientId=producer-1]
Got error produce response with correlation id 16512436 on
topic-partition
input-topic-82, retrying (2147483646 attempts left). Error:
OUT_OF_ORDER_SEQUENCE_NUMBER

Regards,
Neeraj


Reply via email to