Thanks Matthias, I think it's related to https://issues.apache.org/jira/browse/KAFKA-8802. Once I disabled the cache, bytes out goes up a lot. Before the bytes out are kept at 14MB/S no matter bytes in is high or low. I'm trying to download the 2.3.1-rc2 build and try it again.
On Thu, Oct 24, 2019 at 1:27 AM Matthias J. Sax <matth...@confluent.io> wrote: > > >> 1) The app has a huge consuming lag on both source topic and internal > >> repartition topics , ~5 M messages and keeps growing. Will the lag > >> lead to this timeout exception? My understanding is the app polls too > >> many messages before it could send out even the lag indicates it still > >> polls too slow? > > polling and writing messages are independent. Also, an increasing > consumer lag itself, would not lead to a TimeoutException. > > It's hard to say in general, but it might be a network issue? Did you > try to increase `retries` config as suggested by the error message? > Did you also check the health of the brokers? If the network is stable > but brokers are unhealthy, it may also lead to timeout exceptions. > > For the increasing consumer lag, scaling out the application should > actually help. > > > -Matthias > > > On 9/27/19 2:03 PM, Xiyuan Hu wrote: > > Hi, > > > > I'm running Kafka Streams v2.1.0 with windowing function and 3 threads > > per node. The input traffic is about 120K messages/sec. Once deploy, > > after couple minutes, some thread will get TimeoutException and goes > > to DEAD state. > > > > 2019-09-27 13:04:34,449 ERROR [client-StreamThread-2] > > o.a.k.s.p.i.AssignedStreamsTasks stream-thread [client-StreamThread-2] > > Failed to commit stream task 1_35 due to the following error: > > org.apache.kafka.streams.errors.StreamsException: task [1_35] Abort > > sending since an error caught with a previous record (key > > 174044298638db0 value [B@33423c0 timestamp 1569613689747) to topic > > XXX-KSTREAM-REDUCE-STATE-STORE-0000000003-changelog due to > > org.apache.kafka.common.errors.TimeoutException: Expiring 45 record(s) > > for XXX-KSTREAM-REDUCE-STATE-STORE-0000000003-changelog-35:300102 ms > > has passed since batch creation > > You can increase producer parameter `retries` and `retry.backoff.ms` > > to avoid this error. > > at > > org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:133) > > Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring > > 45 record(s) for > > XXX-KSTREAM-REDUCE-STATE-STORE-0000000003-changelog-35:300102 ms has > > passed since batch creation > > > > I changed linger.ms to 100 and request.timeout.ms to 300000. > > > > A couple questions: > > 1) The app has a huge consuming lag on both source topic and internal > > repartition topics , ~5 M messages and keeps growing. Will the lag > > lead to this timeout exception? My understanding is the app polls too > > many messages before it could send out even the lag indicates it still > > polls too slow? > > > > 2) I checked the thread status from streams.localThreadsMetadata() and > > found that, a lot threads are switching between RUNNING and > > PARTITIONS_REVOKED. Eventually, stuck at PARTITIONS_REVOKED. I didn't > > see any error message from the log related to this. What might be a > > good approach to trace the cause? > > > > Thanks a lot! Any help is appreciated! > > >