Thanks Damian ! That's was it, after fixing number compaction threads to be higher than 1, it finally continue to consume stream.
On Fri, Jun 30, 2017 at 7:48 PM, Dmitriy Vsekhvalnov <dvsekhval...@gmail.com > wrote: > Yeah, can confirm there is only 1 vCPU. > > Okay, will try that configuration and get back to you guys. > > Thanks so far ! > > On Fri, Jun 30, 2017 at 7:35 PM, Damian Guy <damian....@gmail.com> wrote: > >> I'm wondering if it is related to this: >> http://docs.confluent.io/current/streams/faq.html#rocksdb- >> behavior-in-1-core-environments >> It seems like the same issue. >> >> Thanks, >> Damian >> >> On Fri, 30 Jun 2017 at 17:16 Dmitriy Vsekhvalnov <dvsekhval...@gmail.com> >> wrote: >> >> > Yes, StreamThread-1 #93 daemon is still at at org.rocksdb.RocksDB.put. >> > >> > No, AWS machine. >> > >> > kafka-streams v0.10.2.1 >> > >> > May be some option for RockDB that can unlock it? Also i can try to run >> app >> > locally against same env to see if it make difference (though it will be >> > different OS). >> > >> > On Fri, Jun 30, 2017 at 6:33 PM, Damian Guy <damian....@gmail.com> >> wrote: >> > >> > > Yep, if you take another thread dump is it in the same spot? >> > > Which version of streams are you running? Are you using docker? >> > > >> > > Thanks, >> > > Damian >> > > >> > > On Fri, 30 Jun 2017 at 16:22 Dmitriy Vsekhvalnov < >> dvsekhval...@gmail.com >> > > >> > > wrote: >> > > >> > > > Is this one looking suspicious? >> > > > >> > > > >> > > > https://gist.github.com/dvsekhvalnov/146ba41c8e78316941098997c9d2f1 >> > > 8a#file-thread-dump >> > > > >> > > > On Fri, Jun 30, 2017 at 6:10 PM, Damian Guy <damian....@gmail.com> >> > > wrote: >> > > > >> > > > > It looks like Task [4_2] is stuck restoring state, though it >> doesn't >> > > look >> > > > > like there is much state to restore. >> > > > > It might be helpful if you take some thread dumps to see where it >> is >> > > > > blocked. >> > > > > >> > > > > Thanks, >> > > > > Damian >> > > > > >> > > > > On Fri, 30 Jun 2017 at 16:04 Dmitriy Vsekhvalnov < >> > > dvsekhval...@gmail.com >> > > > > >> > > > > wrote: >> > > > > >> > > > > > Set org.apache.kafka.streams to DEBUG. >> > > > > > >> > > > > > Here is gist: >> > > > > > >> > > > > > >> > https://gist.github.com/dvsekhvalnov/b84b72349837f6c6394f1adfe18cdb >> > > > > 61#file-debug-logs >> > > > > > >> > > > > > On Fri, Jun 30, 2017 at 12:37 PM, Dmitriy Vsekhvalnov < >> > > > > > dvsekhval...@gmail.com> wrote: >> > > > > > >> > > > > > > Sure, how to enable debug logs? Just adjust logback logger? >> > > > > > > >> > > > > > > >> > > > > > > On Fri, Jun 30, 2017 at 11:24 Eno Thereska < >> > eno.there...@gmail.com >> > > > >> > > > > > wrote: >> > > > > > > >> > > > > > >> It’s hard to tell, the logs do not contain much, I agree. It >> > could >> > > > be >> > > > > a >> > > > > > >> number of things. >> > > > > > >> >> > > > > > >> If it’s happening as you say on restart as well (so it’s >> > > > > reproducible), >> > > > > > >> any chance you could start streaming with DEBUG logs on and >> > > collect >> > > > > > those >> > > > > > >> logs? I’m hoping something shows up there. >> > > > > > >> >> > > > > > >> Thanks, >> > > > > > >> Eno >> > > > > > >> >> > > > > > >> >> > > > > > >> > On Jun 28, 2017, at 5:30 PM, Dmitriy Vsekhvalnov < >> > > > > > >> dvsekhval...@gmail.com> wrote: >> > > > > > >> > >> > > > > > >> > Nothing for stat-change.log for giving time window. Last >> line >> > > > > logged 4 >> > > > > > >> > hours before app stopped. >> > > > > > >> > >> > > > > > >> > Any ideas so far? Personally i don't see anything relevant >> in >> > > > logs. >> > > > > > >> > >> > > > > > >> > On Wed, Jun 28, 2017 at 6:33 PM, Bill Bejeck < >> > b...@confluent.io >> > > > >> > > > > > wrote: >> > > > > > >> > >> > > > > > >> >> Sure, couldn't hurt. >> > > > > > >> >> >> > > > > > >> >> Thanks, >> > > > > > >> >> Bill >> > > > > > >> >> >> > > > > > >> >> On Wed, Jun 28, 2017 at 9:51 AM, Dmitriy Vsekhvalnov < >> > > > > > >> >> dvsekhval...@gmail.com >> > > > > > >> >>> wrote: >> > > > > > >> >> >> > > > > > >> >>> Here are logs: >> > > > > > >> >>> >> > > > > > >> >>> app: >> > > > > > >> >>> https://gist.github.com/dvsekhvalnov/ >> > > > > f98afc3463f0c63b1722417e3710a8 >> > > > > > >> >>> e7#file-kafka-streams-log >> > > > > > >> >>> brokers: >> > > > > > >> >>> https://gist.github.com/dvsekhvalnov/ >> > > > > 8e870f7347394e8d004c282880ef38 >> > > > > > >> >>> 5a#file-kafka-broker-1-2-3-log >> > > > > > >> >>> >> > > > > > >> >>> All broker logs are same, so single gist. >> > > > > > >> >>> >> > > > > > >> >>> There are also state-change.log files, do you want to >> take a >> > > > look >> > > > > at >> > > > > > >> >> those >> > > > > > >> >>> as well? >> > > > > > >> >>> >> > > > > > >> >>> On Wed, Jun 28, 2017 at 4:31 PM, Bill Bejeck < >> > > b...@confluent.io >> > > > > >> > > > > > >> wrote: >> > > > > > >> >>> >> > > > > > >> >>>> Hi Dmitry, >> > > > > > >> >>>> >> > > > > > >> >>>> At the moment I don't have anything specific to look >> for, >> > > just >> > > > > > trying >> > > > > > >> >> to >> > > > > > >> >>>> get more context around the issue. >> > > > > > >> >>>> >> > > > > > >> >>>> As for the logs maybe broker and streams logs for the >> last >> > 30 >> > > > > > minutes >> > > > > > >> >> up >> > > > > > >> >>> to >> > > > > > >> >>>> the time the application stopped processing records. >> > > > > > >> >>>> >> > > > > > >> >>>> Thanks, >> > > > > > >> >>>> Bill >> > > > > > >> >>>> >> > > > > > >> >>>> On Wed, Jun 28, 2017 at 9:04 AM, Dmitriy Vsekhvalnov < >> > > > > > >> >>>> dvsekhval...@gmail.com >> > > > > > >> >>>>> wrote: >> > > > > > >> >>>> >> > > > > > >> >>>>> Hi Bill, >> > > > > > >> >>>>> >> > > > > > >> >>>>> 1. sure, can extract some logs, what exactly to look >> for? >> > > > There >> > > > > > are >> > > > > > >> >> 11 >> > > > > > >> >>>>> hours of logs and most of them looks like: >> > > > > > >> >>>>> >> > > > > > >> >>>>> [2017-06-27 03:30:50,553] [] [INFO ] [StreamThread-1] >> > > > > > >> >>>>> >> > [org.apache.kafka.streams.processor.internals.StreamThread] >> > > > > > >> >>>> [stream-thread >> > > > > > >> >>>>> [StreamThread-1] Committing all tasks because the >> commit >> > > > > interval >> > > > > > >> >>> 5000ms >> > > > > > >> >>>>> has elapsed] >> > > > > > >> >>>>> >> > > > > > >> >>>>> [2017-06-27 03:30:50,553] [] [INFO ] [StreamThread-1] >> > > > > > >> >>>>> >> > [org.apache.kafka.streams.processor.internals.StreamThread] >> > > > > > >> >>>> [stream-thread >> > > > > > >> >>>>> [StreamThread-1] Committing task StreamTask 0_0] >> > > > > > >> >>>>> >> > > > > > >> >>>>> [2017-06-27 03:30:50,554] [] [INFO ] [StreamThread-1] >> > > > > > >> >>>>> >> > [org.apache.kafka.streams.processor.internals.StreamThread] >> > > > > > >> >>>> [stream-thread >> > > > > > >> >>>>> [StreamThread-1] Committing task StreamTask 2_0] >> > > > > > >> >>>>> >> > > > > > >> >>>>> Something specific to search for? >> > > > > > >> >>>>> >> > > > > > >> >>>>> 2. Yes, there are more messages coming to topic. >> > > > > > >> >>>>> >> > > > > > >> >>>>> On Wed, Jun 28, 2017 at 3:43 PM, Bill Bejeck < >> > > > b...@confluent.io >> > > > > > >> > > > > > >> >>> wrote: >> > > > > > >> >>>>> >> > > > > > >> >>>>>> Hi Dimitry, >> > > > > > >> >>>>>> >> > > > > > >> >>>>>> I'm happy to help, but I could use more information. >> Can >> > > you >> > > > > > share >> > > > > > >> >>> the >> > > > > > >> >>>>>> streams logs and broker logs? >> > > > > > >> >>>>>> >> > > > > > >> >>>>>> Have you confirmed messages are still being delivered >> to >> > > > topics >> > > > > > >> >> (via >> > > > > > >> >>>>>> console consumer)? >> > > > > > >> >>>>>> >> > > > > > >> >>>>>> Thanks, >> > > > > > >> >>>>>> Bill >> > > > > > >> >>>>>> >> > > > > > >> >>>>>> On Wed, Jun 28, 2017 at 8:24 AM, Dmitriy Vsekhvalnov < >> > > > > > >> >>>>>> dvsekhval...@gmail.com >> > > > > > >> >>>>>>> wrote: >> > > > > > >> >>>>>> >> > > > > > >> >>>>>>> Hi all, >> > > > > > >> >>>>>>> >> > > > > > >> >>>>>>> looking for some assistance in debugging >> kafka-streams >> > > > > > >> >> application. >> > > > > > >> >>>>>>> >> > > > > > >> >>>>>>> Kafka broker 0.10.2.1 - x3 Node cluster >> > > > > > >> >>>>>>> kafka-streams 0.10.2.1 - x2 application nodes x 1 >> > stream >> > > > > thread >> > > > > > >> >>>> each. >> > > > > > >> >>>>>>> >> > > > > > >> >>>>>>> In streams configuration only: >> > > > > > >> >>>>>>> - SSL transport >> > > > > > >> >>>>>>> - kafka.streams.commitIntervalMs set to 5000 >> (instead of >> > > > > default >> > > > > > >> >>>> 30s). >> > > > > > >> >>>>>>> >> > > > > > >> >>>>>>> We running simple aggregation app with several >> grouping >> > > > > streams. >> > > > > > >> >>>>> Running >> > > > > > >> >>>>>> 2 >> > > > > > >> >>>>>>> instances of an app for redundancy. Both instances >> were >> > > > > working >> > > > > > >> >>>> pretty >> > > > > > >> >>>>>> fine >> > > > > > >> >>>>>>> for 11 hours 15 minutes then stopped consuming new >> > events >> > > > from >> > > > > > >> >>> topic. >> > > > > > >> >>>>>>> >> > > > > > >> >>>>>>> Hosting JVM processes were working fine, just streams >> > > > stopped >> > > > > > >> >>>> reacting >> > > > > > >> >>>>> to >> > > > > > >> >>>>>>> new data. No exceptions, errors, e.t.c. in logs. >> After >> > > > > restart >> > > > > > >> >>>> streams >> > > > > > >> >>>>>>> still not consuming new messages. >> > > > > > >> >>>>>>> >> > > > > > >> >>>>>>> Below is 2 last entries from kafka-streams logs from >> > both >> > > > > hosts: >> > > > > > >> >>>>>>> >> > > > > > >> >>>>>>> [2017-06-27 14:45:09,663] [] [INFO ] [StreamThread-1] >> > > > > > >> >>>>>>> [org.apache.kafka.streams.processor.internals. >> > > StreamThread] >> > > > > > >> >>>>>> [stream-thread >> > > > > > >> >>>>>>> [StreamThread-1] Committing task StreamTask 4_2] >> > > > > > >> >>>>>>> >> > > > > > >> >>>>>>> [2017-06-27 14:45:09,723] [] [INFO ] [StreamThread-1] >> > > > > > >> >>>>>>> [org.apache.kafka.streams.processor.internals. >> > > StreamThread] >> > > > > > >> >>>>>> [stream-thread >> > > > > > >> >>>>>>> [StreamThread-1] Committing task StreamTask 2_1] >> > > > > > >> >>>>>>> >> > > > > > >> >>>>>>> Pretty puzzling why they stopped exactly same moment >> > (with >> > > > > > >> >> respect >> > > > > > >> >>> to >> > > > > > >> >>>>>>> millis). >> > > > > > >> >>>>>>> >> > > > > > >> >>>>>>> Really appreciate any ideas where to dig to. >> > > > > > >> >>>>>>> >> > > > > > >> >>>>>>> Thank you. >> > > > > > >> >>>>>>> >> > > > > > >> >>>>>> >> > > > > > >> >>>>> >> > > > > > >> >>>> >> > > > > > >> >>> >> > > > > > >> >> >> > > > > > >> >> > > > > > >> >> > > > > > >> > > > > >> > > > >> > > >> > >> > >