Hi Guozhang,

The errors have gone away after migrating both my brokers and api to
10.2.1.
But, regardng the error the specific theads moved from running to not
running state.

-Sameer.

On Tue, May 9, 2017 at 12:16 AM, Guozhang Wang <wangg...@gmail.com> wrote:

> Hi Sameer,
>
> I looked at the logs, and there is only one suspicious entry:
>
> ```
> 2017-05-03 14:26:54 WARN  StreamThread:1184 - Could not create task 0_21.
> Will retry.
> org.apache.kafka.streams.errors.LockException: task [0_21] Failed to lock
> the state directory: /data/streampoc/LIC2-4/0_21
> ```
>
> It replies three times and then did not show up, but I cannot tell for sure
> since it is towards the end of the log file. This WARN entry is not
> expected to be a fatal error and would go away after some time, and should
> not hinder the apps. So my question is 1) did you see this WARN repeating
> forever and 2) how long have you observed that the app is stuck, and while
> it is stuck does the above entry never go away?
>
>
> Guozhang
>
>
> On Wed, May 3, 2017 at 10:50 PM, Sameer Kumar <sam.kum.w...@gmail.com>
> wrote:
>
> > My brokers are on version 10.1.0 and my clients are on version 10.2.0.
> > Also, do a reply to all, I am currently not subscribed to the mailing
> list.
> >
> > -Sameer.
> >
> > On Wed, May 3, 2017 at 5:27 PM, Sameer Kumar <sam.kum.w...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > >
> > >
> > > I want to report an issue where in addition of a server at runtime in
> my
> > > streams compute cluster caused errors and subsequent complete halting
> of
> > > the cluster. I am not sure if this is the actual issue, but this was
> > > something I did differently while 18 hour smooth run of the streams
> app.
> > >
> > >
> > >
> > > Initially, I had one machine working on my Kafka topic, which contains
> > > impressions and clicks. The job was running overnight, in the morning I
> > > just added another machine to the cluster and this is when every time
> > stuck
> > > after working fine for some time.
> > >
> > >
> > >
> > > Please find the kafka_log_snippet and poc_log_snippet attached.
> > >
> > >
> > >
> > > Thereafter, failing of these nodes, I tried to restart just one machine
> > on
> > > my compute cluster to see if it can initialize itself.
> > >
> > > Please the logs attached for the same as well. Following were the logs
> I
> > > saw quite often.
> > >
> > >
> > >
> > > 2017-05-03 14:15:53 DEBUG Fetcher:526 - Ignoring fetched records for
> > > LIC2-4-licountci-4-changelog-38 at offset 556717 since the current
> > > position is 557065
> > >
> > > 2017-05-03 14:15:53 DEBUG Fetcher:180 - Sending fetch for partitions
> > > [LIC2-4-licountci-4-changelog-38] to broker 172.29.65.190:9092 (id: 0
> > > rack: null)
> > >
> > > 2017-05-03 14:15:53 DEBUG Fetcher:526 - Ignoring fetched records for
> > > LIC2-4-licountci-4-changelog-48 at offset 607657 since the current
> > > position is 607880
> > >
> > > 2017-05-03 14:15:53 DEBUG Fetcher:180 - Sending fetch for partitions
> > > [LIC2-4-licountci-4-changelog-48] to broker 172.29.65.192:9092 (id: 2
> > > rack: null)
> > >
> > > 2017-05-03 14:15:53 DEBUG Fetcher:526 - Ignoring fetched records for
> > > LIC2-4-licountci-4-changelog-31 at offset 282265 since the current
> > > position is 282327
> > >
> > > 2017-05-03 14:15:53 DEBUG Fetcher:180 - Sending fetch for partitions
> > > [LIC2-4-licountci-4-changelog-31] to broker 172.29.65.191:9092 (id: 1
> > > rack: null)
> > >
> > > 2017-05-03 14:15:53 DEBUG Fetcher:526 - Ignoring fetched records for
> > > LIC2-4-licountci-4-changelog-3 at offset 499952 since the current
> > position
> > > is 500324
> > >
> > > 2017-05-03 14:15:53 DEBUG Fetcher:180 - Sending fetch for partitions
> > > [LIC2-4-licountci-4-changelog-3] to broker 172.29.65.192:9092 (id: 2
> > > rack: null)
> > >
> > > 2017-05-03 14:15:53 DEBUG Fetcher:526 - Ignoring fetched records for
> > > LIC2-4-licountci-4-changelog-21 at offset 587018 since the current
> > > position is 587227
> > >
> > > 2017-05-03 14:15:53 DEBUG Fetcher:180 - Sending fetch for partitions
> > > [LIC2-4-licountci-4-changelog-21] to broker 172.29.65.192:9092 (id: 2
> > > rack: null)
> > >
> > > 2017-05-03 14:15:53 DEBUG Fetcher:526 - Ignoring fetched records for
> > > LIC2-4-licountci-4-changelog-49 at offset 276209 since the current
> > > position is 276271
> > >
> > > 2017-05-03 14:15:53 DEBUG Fetcher:180 - Sending fetch for partitions
> > > [LIC2-4-licountci-4-changelog-49] to broker 172.29.65.191:9092 (id: 1
> > > rack: null)
> > >
> > > 2017-05-03 14:15:53 DEBUG Fetcher:526 - Ignoring fetched records for
> > > LIC2-4-licountci-4-changelog-16 at offset 592727 since the current
> > > position is 592896
> > >
> > > 2017-05-03 14:15:53 DEBUG Fetcher:180 - Sending fetch for partitions
> > > [LIC2-4-licountci-4-changelog-16] to broker 172.29.65.191:9092 (id: 1
> > > rack: null)
> > >
> > > 2017-05-03 14:15:53 DEBUG Fetcher:526 - Ignoring fetched records for
> > > LIC2-4-licountci-4-changelog-37 at offset 458224 since the current
> > > position is 458343
> > >
> > > 2017-05-03 14:15:53 DEBUG Fetcher:180 - Sending fetch for partitions
> > > [LIC2-4-licountci-4-changelog-37] to broker 172.29.65.191:9092 (id: 1
> > > rack: null)
> > >
> > > 2017-05-03 14:15:53 DEBUG Fetcher:526 - Ignoring fetched records for
> > > LIC2-4-licountci-4-changelog-59 at offset 495722 since the current
> > > position is 496113
> > >
> > > 2017-05-03 14:15:53 DEBUG Fetcher:180 - Sending fetch for partitions
> > > [LIC2-4-licountci-4-changelog-59] to broker 172.29.65.190:9092 (id: 0
> > > rack: null)
> > >
> > > 2017-05-03 14:15:53 DEBUG Fetcher:526 - Ignoring fetched records for
> > > LIC2-4-licountci-4-changelog-35 at offset 230310 since the current
> > > position is 231236
> > >
> > > 2017-05-03 14:15:53 DEBUG Fetcher:180 - Sending fetch for partitions
> > > [LIC2-4-licountci-4-changelog-35] to broker 172.29.65.190:9092 (id: 0
> > > rack: null)
> > >
> > >
> > >
> > > Regards,
> > >
> > > -Sameer.
> > >
> >
>
>
>
> --
> -- Guozhang
>

Reply via email to