[ https://issues.apache.org/jira/browse/KAFKA-9552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17037912#comment-17037912 ]
John Roesler commented on KAFKA-9552: ------------------------------------- Regarding “fail fast”, it does seem like potentially the best reaction if we detect we may have lost data, since continuing to work could make a bigger mess, e.g. by computing results on local state that has been lost from the changelog. We’ve discussed “fail fast” before, and I’d just reiterate that killing the thread alone isn’t sufficient for failing fast, since another thread would just come along and pick up the (now corrupted) task after a rebalance. What we could do is something like rejoin the group with a “poison pill” subscriptionInfo, upon which the leader would inform all threads to shut down. > Stream should handle OutOfSequence exception thrown from Producer > ----------------------------------------------------------------- > > Key: KAFKA-9552 > URL: https://issues.apache.org/jira/browse/KAFKA-9552 > Project: Kafka > Issue Type: Improvement > Components: streams > Affects Versions: 2.5.0 > Reporter: Boyang Chen > Priority: Major > > As of today the stream thread could die from OutOfSequence error: > {code:java} > [2020-02-12T07:14:35-08:00] > (streams-soak-2-5-eos_soak_i-03f89b1e566ac95cc_streamslog) > org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker > received an out of order sequence number. > [2020-02-12T07:14:35-08:00] > (streams-soak-2-5-eos_soak_i-03f89b1e566ac95cc_streamslog) [2020-02-12 > 15:14:35,185] ERROR > [stream-soak-test-546f8754-5991-4d62-8565-dbe98d51638e-StreamThread-1] > stream-thread > [stream-soak-test-546f8754-5991-4d62-8565-dbe98d51638e-StreamThread-1] Failed > to commit stream task 3_2 due to the following error: > (org.apache.kafka.streams.processor.internals.AssignedStreamsTasks) > [2020-02-12T07:14:35-08:00] > (streams-soak-2-5-eos_soak_i-03f89b1e566ac95cc_streamslog) > org.apache.kafka.streams.errors.StreamsException: task [3_2] Abort sending > since an error caught with a previous record (timestamp 1581484094825) to > topic stream-soak-test-KSTREAM-AGGREGATE-STATE-STORE-0000000049-changelog due > to org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker > received an out of order sequence number. > at > org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:154) > at > org.apache.kafka.streams.processor.internals.RecordCollectorImpl.access$500(RecordCollectorImpl.java:52) > at > org.apache.kafka.streams.processor.internals.RecordCollectorImpl$1.onCompletion(RecordCollectorImpl.java:214) > at > org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java:1353) > {code} > Although this is fatal exception for Producer, stream should treat it as an > opportunity to reinitialize by doing a rebalance, instead of killing > computation resource. -- This message was sent by Atlassian Jira (v8.3.4#803005)