[
https://issues.apache.org/jira/browse/KAFKA-9552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17037683#comment-17037683
]
Matthias J. Sax edited comment on KAFKA-9552 at 11/23/20, 4:59 PM:
-------------------------------------------------------------------
I am not sure if we need to re-balance – if we would have missed a rebalance
and lost the task, we would get a `ProducerFencedException`. Hence, on this
error we should still be part of the consumer group.
>From my understanding an `OutOfOrderSequenceException` implies data loss, ie,
>we got an ack back, but on the next send data is not in the log (this could
>happen if unclean leader election is enabled broker side) – otherwise should
>indicate a severe bug.
While we could abort the current transaction, and reinitialize the task (ie,
refetch the input topic offsets, cleanup the state etc), I am wondering if we
should do this as it would mask a bug? Instead, it might be better to not catch
and fail fast thus we can report this error?
Btw: In `RecordCollectorImpl` in a recent PR we started to catch
`OutOfOrderSequenceException` and rethrow `TaskMigratedException` for this case
– however, I am not sure if we should keep this change or roll it back for the
same reason.
\cc [~guozhang] [~hachikuji] [~vvcephei]
was (Author: mjsax):
I am not sure if we need to re-balance – if we would have missed a rebalance
and lost the task, we would get a `ProducerFencedException`. Hence, on this
error we should still be part of the consumer group.
>From my understanding an `OutOfOrderSequenceExceptio`n implies data loss, ie,
>we got an ack back, but on the next send data is not in the log (this could
>happen if unclean leader election is enabled broker side) – otherwise should
>indicate a severe bug.
While we could abort the current transaction, and reinitialize the task (ie,
refetch the input topic offsets, cleanup the state etc), I am wondering if we
should do this as it would mask a bug? Instead, it might be better to not catch
and fail fast thus we can report this error?
Btw: In `RecordCollectorImpl` in a recent PR we started to catch
`OutOfOrderSequenceException` and rethrow `TaskMigratedException` for this case
– however, I am not sure if we should keep this change or roll it back for the
same reason.
\cc [~guozhang] [~hachikuji] [~vvcephei]
> Stream should handle OutOfSequence exception thrown from Producer
> -----------------------------------------------------------------
>
> Key: KAFKA-9552
> URL: https://issues.apache.org/jira/browse/KAFKA-9552
> Project: Kafka
> Issue Type: Improvement
> Components: streams
> Affects Versions: 2.5.0
> Reporter: Boyang Chen
> Priority: Major
>
> As of today the stream thread could die from OutOfSequence error:
> {code:java}
> [2020-02-12T07:14:35-08:00]
> (streams-soak-2-5-eos_soak_i-03f89b1e566ac95cc_streamslog)
> org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker
> received an out of order sequence number.
> [2020-02-12T07:14:35-08:00]
> (streams-soak-2-5-eos_soak_i-03f89b1e566ac95cc_streamslog) [2020-02-12
> 15:14:35,185] ERROR
> [stream-soak-test-546f8754-5991-4d62-8565-dbe98d51638e-StreamThread-1]
> stream-thread
> [stream-soak-test-546f8754-5991-4d62-8565-dbe98d51638e-StreamThread-1] Failed
> to commit stream task 3_2 due to the following error:
> (org.apache.kafka.streams.processor.internals.AssignedStreamsTasks)
> [2020-02-12T07:14:35-08:00]
> (streams-soak-2-5-eos_soak_i-03f89b1e566ac95cc_streamslog)
> org.apache.kafka.streams.errors.StreamsException: task [3_2] Abort sending
> since an error caught with a previous record (timestamp 1581484094825) to
> topic stream-soak-test-KSTREAM-AGGREGATE-STATE-STORE-0000000049-changelog due
> to org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker
> received an out of order sequence number.
> at
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:154)
> at
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.access$500(RecordCollectorImpl.java:52)
> at
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl$1.onCompletion(RecordCollectorImpl.java:214)
> at
> org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java:1353)
> {code}
> Although this is fatal exception for Producer, stream should treat it as an
> opportunity to reinitialize by doing a rebalance, instead of killing
> computation resource.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)