[ https://issues.apache.org/jira/browse/KAFKA-9552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17038024#comment-17038024 ]
Matthias J. Sax edited comment on KAFKA-9552 at 2/17/20 1:15 AM: ----------------------------------------------------------------- {quote}currently we suspect that an OutOfSequence may be thrown unnecessary {quote} This is interesting – however, the question is what the root cause would be? If it's broker side, I agree that we might want to work around it client side. However, if the issue is in the producer, we should fix the producer instead. {quote}What we could do is something like rejoin the group with a “poison pill” subscriptionInfo, upon which the leader would inform all threads to shut down. {quote} Interesting idea. In general I agree, that _if_ we treat this error as fatal, we need to make sure all threads stop processing – otherwise, i.e., if we don't think it fatal, we could just recover the task locally. I think the difference to past "fatal errors" is, that this one does not propagate automatically (for example, an authorization error would affect all thread in the same way, or if we time out talking to a specific broker, the thread picking up the task would also time out.) was (Author: mjsax): {quote}currently we suspect that an OutOfSequence may be thrown unnecessary {quote} This is interesting – however, the question is what the root cause would be? If it's broker side, I agree that we might want to work around it client side. However, if the issue is in the producer, we should fix the producer instead. {quote}What we could do is something like rejoin the group with a “poison pill” subscriptionInfo, upon which the leader would inform all threads to shut down. {quote} Interesting idea. In general I agree, that _if_ we treat this error as fatal, we need to make sure all threads stop processing – otherwise, i.e., if we don't think it fatal, we could just recover the task locally. > Stream should handle OutOfSequence exception thrown from Producer > ----------------------------------------------------------------- > > Key: KAFKA-9552 > URL: https://issues.apache.org/jira/browse/KAFKA-9552 > Project: Kafka > Issue Type: Improvement > Components: streams > Affects Versions: 2.5.0 > Reporter: Boyang Chen > Priority: Major > > As of today the stream thread could die from OutOfSequence error: > {code:java} > [2020-02-12T07:14:35-08:00] > (streams-soak-2-5-eos_soak_i-03f89b1e566ac95cc_streamslog) > org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker > received an out of order sequence number. > [2020-02-12T07:14:35-08:00] > (streams-soak-2-5-eos_soak_i-03f89b1e566ac95cc_streamslog) [2020-02-12 > 15:14:35,185] ERROR > [stream-soak-test-546f8754-5991-4d62-8565-dbe98d51638e-StreamThread-1] > stream-thread > [stream-soak-test-546f8754-5991-4d62-8565-dbe98d51638e-StreamThread-1] Failed > to commit stream task 3_2 due to the following error: > (org.apache.kafka.streams.processor.internals.AssignedStreamsTasks) > [2020-02-12T07:14:35-08:00] > (streams-soak-2-5-eos_soak_i-03f89b1e566ac95cc_streamslog) > org.apache.kafka.streams.errors.StreamsException: task [3_2] Abort sending > since an error caught with a previous record (timestamp 1581484094825) to > topic stream-soak-test-KSTREAM-AGGREGATE-STATE-STORE-0000000049-changelog due > to org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker > received an out of order sequence number. > at > org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:154) > at > org.apache.kafka.streams.processor.internals.RecordCollectorImpl.access$500(RecordCollectorImpl.java:52) > at > org.apache.kafka.streams.processor.internals.RecordCollectorImpl$1.onCompletion(RecordCollectorImpl.java:214) > at > org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java:1353) > {code} > Although this is fatal exception for Producer, stream should treat it as an > opportunity to reinitialize by doing a rebalance, instead of killing > computation resource. -- This message was sent by Atlassian Jira (v8.3.4#803005)