Hi Have you set replication.factor and retries properties?
BR tors 5 okt. 2017 kl. 18:45 skrev Dmitriy Vsekhvalnov <dvsekhval...@gmail.com >: > Hi all, > > we were testing Kafka cluster outages by randomly crashing broker nodes (1 > of 3 for instance) while still keeping majority of replicas available. > > Time to time our kafka-stream app crashing with exception: > > [ERROR] [StreamThread-1] > [org.apache.kafka.streams.processor.internals.StreamThread] [stream-thread > [StreamThread-1] Failed while executing StreamTask 0_1 due to flush state: > ] org.apache.kafka.streams.errors.StreamsException: task [0_1] exception > caught when producing at > > org.apache.kafka.streams.processor.internals.RecordCollectorImpl.checkForException(RecordCollectorImpl.java:121) > at > > org.apache.kafka.streams.processor.internals.RecordCollectorImpl.flush(RecordCollectorImpl.java:129) > at > > org.apache.kafka.streams.processor.internals.StreamTask.flushState(StreamTask.java:422) > at > > org.apache.kafka.streams.processor.internals.StreamThread$4.apply(StreamThread.java:555) > at > > org.apache.kafka.streams.processor.internals.StreamThread.performOnTasks(StreamThread.java:501) > at > > org.apache.kafka.streams.processor.internals.StreamThread.flushAllState(StreamThread.java:551) > at > > org.apache.kafka.streams.processor.internals.StreamThread.shutdownTasksAndState(StreamThread.java:449) > at > > org.apache.kafka.streams.processor.internals.StreamThread.shutdown(StreamThread.java:391) > at > > org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:372) > Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 1 > record(s) for > > audit-metrics-collector-store_context.operation-message.bucketStartMs-message.dc-repartition-0: > 30026 ms has passed since batch creation plus linger time > > after that clearly only restart of app and it continues processing. > > We believe it is correlated with our outage testing and question is: what > are recommended options for make kafka-stream application more resilient to > broker crashes? > > Thank you. >