I will reply on that thread. On Tue, May 16, 2017 at 2:37 AM, Sameer Kumar <sam.kum.w...@gmail.com> wrote:
> Hi Guozhang, > > The errors have gone away after migrating both my brokers and api to > 10.2.1. > But, regardng the error the specific theads moved from running to not > running state. > > -Sameer. > > On Tue, May 9, 2017 at 12:16 AM, Guozhang Wang <wangg...@gmail.com> wrote: > > > Hi Sameer, > > > > I looked at the logs, and there is only one suspicious entry: > > > > ``` > > 2017-05-03 14:26:54 WARN StreamThread:1184 - Could not create task 0_21. > > Will retry. > > org.apache.kafka.streams.errors.LockException: task [0_21] Failed to > lock > > the state directory: /data/streampoc/LIC2-4/0_21 > > ``` > > > > It replies three times and then did not show up, but I cannot tell for > sure > > since it is towards the end of the log file. This WARN entry is not > > expected to be a fatal error and would go away after some time, and > should > > not hinder the apps. So my question is 1) did you see this WARN repeating > > forever and 2) how long have you observed that the app is stuck, and > while > > it is stuck does the above entry never go away? > > > > > > Guozhang > > > > > > On Wed, May 3, 2017 at 10:50 PM, Sameer Kumar <sam.kum.w...@gmail.com> > > wrote: > > > > > My brokers are on version 10.1.0 and my clients are on version 10.2.0. > > > Also, do a reply to all, I am currently not subscribed to the mailing > > list. > > > > > > -Sameer. > > > > > > On Wed, May 3, 2017 at 5:27 PM, Sameer Kumar <sam.kum.w...@gmail.com> > > > wrote: > > > > > > > Hi, > > > > > > > > > > > > > > > > I want to report an issue where in addition of a server at runtime in > > my > > > > streams compute cluster caused errors and subsequent complete halting > > of > > > > the cluster. I am not sure if this is the actual issue, but this was > > > > something I did differently while 18 hour smooth run of the streams > > app. > > > > > > > > > > > > > > > > Initially, I had one machine working on my Kafka topic, which > contains > > > > impressions and clicks. The job was running overnight, in the > morning I > > > > just added another machine to the cluster and this is when every time > > > stuck > > > > after working fine for some time. > > > > > > > > > > > > > > > > Please find the kafka_log_snippet and poc_log_snippet attached. > > > > > > > > > > > > > > > > Thereafter, failing of these nodes, I tried to restart just one > machine > > > on > > > > my compute cluster to see if it can initialize itself. > > > > > > > > Please the logs attached for the same as well. Following were the > logs > > I > > > > saw quite often. > > > > > > > > > > > > > > > > 2017-05-03 14:15:53 DEBUG Fetcher:526 - Ignoring fetched records for > > > > LIC2-4-licountci-4-changelog-38 at offset 556717 since the current > > > > position is 557065 > > > > > > > > 2017-05-03 14:15:53 DEBUG Fetcher:180 - Sending fetch for partitions > > > > [LIC2-4-licountci-4-changelog-38] to broker 172.29.65.190:9092 (id: > 0 > > > > rack: null) > > > > > > > > 2017-05-03 14:15:53 DEBUG Fetcher:526 - Ignoring fetched records for > > > > LIC2-4-licountci-4-changelog-48 at offset 607657 since the current > > > > position is 607880 > > > > > > > > 2017-05-03 14:15:53 DEBUG Fetcher:180 - Sending fetch for partitions > > > > [LIC2-4-licountci-4-changelog-48] to broker 172.29.65.192:9092 (id: > 2 > > > > rack: null) > > > > > > > > 2017-05-03 14:15:53 DEBUG Fetcher:526 - Ignoring fetched records for > > > > LIC2-4-licountci-4-changelog-31 at offset 282265 since the current > > > > position is 282327 > > > > > > > > 2017-05-03 14:15:53 DEBUG Fetcher:180 - Sending fetch for partitions > > > > [LIC2-4-licountci-4-changelog-31] to broker 172.29.65.191:9092 (id: > 1 > > > > rack: null) > > > > > > > > 2017-05-03 14:15:53 DEBUG Fetcher:526 - Ignoring fetched records for > > > > LIC2-4-licountci-4-changelog-3 at offset 499952 since the current > > > position > > > > is 500324 > > > > > > > > 2017-05-03 14:15:53 DEBUG Fetcher:180 - Sending fetch for partitions > > > > [LIC2-4-licountci-4-changelog-3] to broker 172.29.65.192:9092 (id: 2 > > > > rack: null) > > > > > > > > 2017-05-03 14:15:53 DEBUG Fetcher:526 - Ignoring fetched records for > > > > LIC2-4-licountci-4-changelog-21 at offset 587018 since the current > > > > position is 587227 > > > > > > > > 2017-05-03 14:15:53 DEBUG Fetcher:180 - Sending fetch for partitions > > > > [LIC2-4-licountci-4-changelog-21] to broker 172.29.65.192:9092 (id: > 2 > > > > rack: null) > > > > > > > > 2017-05-03 14:15:53 DEBUG Fetcher:526 - Ignoring fetched records for > > > > LIC2-4-licountci-4-changelog-49 at offset 276209 since the current > > > > position is 276271 > > > > > > > > 2017-05-03 14:15:53 DEBUG Fetcher:180 - Sending fetch for partitions > > > > [LIC2-4-licountci-4-changelog-49] to broker 172.29.65.191:9092 (id: > 1 > > > > rack: null) > > > > > > > > 2017-05-03 14:15:53 DEBUG Fetcher:526 - Ignoring fetched records for > > > > LIC2-4-licountci-4-changelog-16 at offset 592727 since the current > > > > position is 592896 > > > > > > > > 2017-05-03 14:15:53 DEBUG Fetcher:180 - Sending fetch for partitions > > > > [LIC2-4-licountci-4-changelog-16] to broker 172.29.65.191:9092 (id: > 1 > > > > rack: null) > > > > > > > > 2017-05-03 14:15:53 DEBUG Fetcher:526 - Ignoring fetched records for > > > > LIC2-4-licountci-4-changelog-37 at offset 458224 since the current > > > > position is 458343 > > > > > > > > 2017-05-03 14:15:53 DEBUG Fetcher:180 - Sending fetch for partitions > > > > [LIC2-4-licountci-4-changelog-37] to broker 172.29.65.191:9092 (id: > 1 > > > > rack: null) > > > > > > > > 2017-05-03 14:15:53 DEBUG Fetcher:526 - Ignoring fetched records for > > > > LIC2-4-licountci-4-changelog-59 at offset 495722 since the current > > > > position is 496113 > > > > > > > > 2017-05-03 14:15:53 DEBUG Fetcher:180 - Sending fetch for partitions > > > > [LIC2-4-licountci-4-changelog-59] to broker 172.29.65.190:9092 (id: > 0 > > > > rack: null) > > > > > > > > 2017-05-03 14:15:53 DEBUG Fetcher:526 - Ignoring fetched records for > > > > LIC2-4-licountci-4-changelog-35 at offset 230310 since the current > > > > position is 231236 > > > > > > > > 2017-05-03 14:15:53 DEBUG Fetcher:180 - Sending fetch for partitions > > > > [LIC2-4-licountci-4-changelog-35] to broker 172.29.65.190:9092 (id: > 0 > > > > rack: null) > > > > > > > > > > > > > > > > Regards, > > > > > > > > -Sameer. > > > > > > > > > > > > > > > -- > > -- Guozhang > > > -- -- Guozhang