Is this one looking suspicious? https://gist.github.com/dvsekhvalnov/146ba41c8e78316941098997c9d2f18a#file-thread-dump
On Fri, Jun 30, 2017 at 6:10 PM, Damian Guy <damian....@gmail.com> wrote: > It looks like Task [4_2] is stuck restoring state, though it doesn't look > like there is much state to restore. > It might be helpful if you take some thread dumps to see where it is > blocked. > > Thanks, > Damian > > On Fri, 30 Jun 2017 at 16:04 Dmitriy Vsekhvalnov <dvsekhval...@gmail.com> > wrote: > > > Set org.apache.kafka.streams to DEBUG. > > > > Here is gist: > > > > https://gist.github.com/dvsekhvalnov/b84b72349837f6c6394f1adfe18cdb > 61#file-debug-logs > > > > On Fri, Jun 30, 2017 at 12:37 PM, Dmitriy Vsekhvalnov < > > dvsekhval...@gmail.com> wrote: > > > > > Sure, how to enable debug logs? Just adjust logback logger? > > > > > > > > > On Fri, Jun 30, 2017 at 11:24 Eno Thereska <eno.there...@gmail.com> > > wrote: > > > > > >> It’s hard to tell, the logs do not contain much, I agree. It could be > a > > >> number of things. > > >> > > >> If it’s happening as you say on restart as well (so it’s > reproducible), > > >> any chance you could start streaming with DEBUG logs on and collect > > those > > >> logs? I’m hoping something shows up there. > > >> > > >> Thanks, > > >> Eno > > >> > > >> > > >> > On Jun 28, 2017, at 5:30 PM, Dmitriy Vsekhvalnov < > > >> dvsekhval...@gmail.com> wrote: > > >> > > > >> > Nothing for stat-change.log for giving time window. Last line > logged 4 > > >> > hours before app stopped. > > >> > > > >> > Any ideas so far? Personally i don't see anything relevant in logs. > > >> > > > >> > On Wed, Jun 28, 2017 at 6:33 PM, Bill Bejeck <b...@confluent.io> > > wrote: > > >> > > > >> >> Sure, couldn't hurt. > > >> >> > > >> >> Thanks, > > >> >> Bill > > >> >> > > >> >> On Wed, Jun 28, 2017 at 9:51 AM, Dmitriy Vsekhvalnov < > > >> >> dvsekhval...@gmail.com > > >> >>> wrote: > > >> >> > > >> >>> Here are logs: > > >> >>> > > >> >>> app: > > >> >>> https://gist.github.com/dvsekhvalnov/ > f98afc3463f0c63b1722417e3710a8 > > >> >>> e7#file-kafka-streams-log > > >> >>> brokers: > > >> >>> https://gist.github.com/dvsekhvalnov/ > 8e870f7347394e8d004c282880ef38 > > >> >>> 5a#file-kafka-broker-1-2-3-log > > >> >>> > > >> >>> All broker logs are same, so single gist. > > >> >>> > > >> >>> There are also state-change.log files, do you want to take a look > at > > >> >> those > > >> >>> as well? > > >> >>> > > >> >>> On Wed, Jun 28, 2017 at 4:31 PM, Bill Bejeck <b...@confluent.io> > > >> wrote: > > >> >>> > > >> >>>> Hi Dmitry, > > >> >>>> > > >> >>>> At the moment I don't have anything specific to look for, just > > trying > > >> >> to > > >> >>>> get more context around the issue. > > >> >>>> > > >> >>>> As for the logs maybe broker and streams logs for the last 30 > > minutes > > >> >> up > > >> >>> to > > >> >>>> the time the application stopped processing records. > > >> >>>> > > >> >>>> Thanks, > > >> >>>> Bill > > >> >>>> > > >> >>>> On Wed, Jun 28, 2017 at 9:04 AM, Dmitriy Vsekhvalnov < > > >> >>>> dvsekhval...@gmail.com > > >> >>>>> wrote: > > >> >>>> > > >> >>>>> Hi Bill, > > >> >>>>> > > >> >>>>> 1. sure, can extract some logs, what exactly to look for? There > > are > > >> >> 11 > > >> >>>>> hours of logs and most of them looks like: > > >> >>>>> > > >> >>>>> [2017-06-27 03:30:50,553] [] [INFO ] [StreamThread-1] > > >> >>>>> [org.apache.kafka.streams.processor.internals.StreamThread] > > >> >>>> [stream-thread > > >> >>>>> [StreamThread-1] Committing all tasks because the commit > interval > > >> >>> 5000ms > > >> >>>>> has elapsed] > > >> >>>>> > > >> >>>>> [2017-06-27 03:30:50,553] [] [INFO ] [StreamThread-1] > > >> >>>>> [org.apache.kafka.streams.processor.internals.StreamThread] > > >> >>>> [stream-thread > > >> >>>>> [StreamThread-1] Committing task StreamTask 0_0] > > >> >>>>> > > >> >>>>> [2017-06-27 03:30:50,554] [] [INFO ] [StreamThread-1] > > >> >>>>> [org.apache.kafka.streams.processor.internals.StreamThread] > > >> >>>> [stream-thread > > >> >>>>> [StreamThread-1] Committing task StreamTask 2_0] > > >> >>>>> > > >> >>>>> Something specific to search for? > > >> >>>>> > > >> >>>>> 2. Yes, there are more messages coming to topic. > > >> >>>>> > > >> >>>>> On Wed, Jun 28, 2017 at 3:43 PM, Bill Bejeck <b...@confluent.io > > > > >> >>> wrote: > > >> >>>>> > > >> >>>>>> Hi Dimitry, > > >> >>>>>> > > >> >>>>>> I'm happy to help, but I could use more information. Can you > > share > > >> >>> the > > >> >>>>>> streams logs and broker logs? > > >> >>>>>> > > >> >>>>>> Have you confirmed messages are still being delivered to topics > > >> >> (via > > >> >>>>>> console consumer)? > > >> >>>>>> > > >> >>>>>> Thanks, > > >> >>>>>> Bill > > >> >>>>>> > > >> >>>>>> On Wed, Jun 28, 2017 at 8:24 AM, Dmitriy Vsekhvalnov < > > >> >>>>>> dvsekhval...@gmail.com > > >> >>>>>>> wrote: > > >> >>>>>> > > >> >>>>>>> Hi all, > > >> >>>>>>> > > >> >>>>>>> looking for some assistance in debugging kafka-streams > > >> >> application. > > >> >>>>>>> > > >> >>>>>>> Kafka broker 0.10.2.1 - x3 Node cluster > > >> >>>>>>> kafka-streams 0.10.2.1 - x2 application nodes x 1 stream > thread > > >> >>>> each. > > >> >>>>>>> > > >> >>>>>>> In streams configuration only: > > >> >>>>>>> - SSL transport > > >> >>>>>>> - kafka.streams.commitIntervalMs set to 5000 (instead of > default > > >> >>>> 30s). > > >> >>>>>>> > > >> >>>>>>> We running simple aggregation app with several grouping > streams. > > >> >>>>> Running > > >> >>>>>> 2 > > >> >>>>>>> instances of an app for redundancy. Both instances were > working > > >> >>>> pretty > > >> >>>>>> fine > > >> >>>>>>> for 11 hours 15 minutes then stopped consuming new events from > > >> >>> topic. > > >> >>>>>>> > > >> >>>>>>> Hosting JVM processes were working fine, just streams stopped > > >> >>>> reacting > > >> >>>>> to > > >> >>>>>>> new data. No exceptions, errors, e.t.c. in logs. After > restart > > >> >>>> streams > > >> >>>>>>> still not consuming new messages. > > >> >>>>>>> > > >> >>>>>>> Below is 2 last entries from kafka-streams logs from both > hosts: > > >> >>>>>>> > > >> >>>>>>> [2017-06-27 14:45:09,663] [] [INFO ] [StreamThread-1] > > >> >>>>>>> [org.apache.kafka.streams.processor.internals.StreamThread] > > >> >>>>>> [stream-thread > > >> >>>>>>> [StreamThread-1] Committing task StreamTask 4_2] > > >> >>>>>>> > > >> >>>>>>> [2017-06-27 14:45:09,723] [] [INFO ] [StreamThread-1] > > >> >>>>>>> [org.apache.kafka.streams.processor.internals.StreamThread] > > >> >>>>>> [stream-thread > > >> >>>>>>> [StreamThread-1] Committing task StreamTask 2_1] > > >> >>>>>>> > > >> >>>>>>> Pretty puzzling why they stopped exactly same moment (with > > >> >> respect > > >> >>> to > > >> >>>>>>> millis). > > >> >>>>>>> > > >> >>>>>>> Really appreciate any ideas where to dig to. > > >> >>>>>>> > > >> >>>>>>> Thank you. > > >> >>>>>>> > > >> >>>>>> > > >> >>>>> > > >> >>>> > > >> >>> > > >> >> > > >> > > >> > > >