Sorry dont see anything in the attachment. Can you please re-attach and
re-send ?

On Wed, Aug 31, 2016 at 3:27 AM, 李斯宁 <lisin...@gmail.com> wrote:

> It seems upgrading does not solve the problem. All task hang in today's
> "rush hour".
> I attached log and jstack.
>
> The SAMZA-911 want to fix by stopping the process if failed too much
> times.  But the process is still there and hanging.
>
> On Mon, Aug 22, 2016 at 1:14 PM, 李斯宁 <lisin...@gmail.com> wrote:
>
>> Thanks so much, I'll try.
>>
>> On Mon, Aug 22, 2016 at 6:26 AM, Yi Pan <nickpa...@gmail.com> wrote:
>>
>>> Hi, Sining,
>>>
>>> This is a known bug that is fixed in 0.10.1 (SAMZA-911). Please try to
>>> upgrade to 0.10.1.
>>>
>>> Thanks!
>>>
>>> -Yi
>>>
>>> On Sun, Aug 21, 2016 at 5:55 AM, 李斯宁 <lisin...@gmail.com> wrote:
>>>
>>> > I have tried restart every kafka server.  The container did not
>>> recover.
>>> >
>>> > log have something below:
>>> >
>>> > 2016-08-21 20:08:21 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>>> > Retrying send messsage due to RetriableException -
>>> > org.apache.kafka.common.errors.NotLeaderForPartitionException: This
>>> server
>>> > is not the leader for that topic-partition.. Turn on debugging to get a
>>> > full stack trace
>>> > 2016-08-21 20:08:22 [WARN ](o.a.k.c.p.i.Sender                 :257)
>>> Got
>>> > error produce response with correlation id 4364 on topic-partition
>>> > samzaMetrics-5, retrying (0 attempts left). Error:
>>> NOT_LEADER_FOR_PARTITION
>>> > 2016-08-21 20:08:23 [WARN ](o.a.k.c.p.i.Sender                 :257)
>>> Got
>>> > error produce response with correlation id 4367 on topic-partition
>>> > samzaMetrics-5, retrying (29 attempts left). Error:
>>> > NOT_LEADER_FOR_PARTITION
>>> >
>>> >
>>> > jstack shows:
>>> >
>>> > "main" #1 prio=5 os_prio=0 tid=0x00007f1ba401a000 nid=0x1a621 waiting
>>> on
>>> > condition [0x00007f1bab976000]
>>> > java.lang.Thread.State: TIMED_WAITING (sleeping)
>>> > at java.lang.Thread.sleep(Native Method)
>>> > at
>>> > org.apache.samza.util.ExponentialSleepStrategy$RetryLoopState.sleep(
>>> > ExponentialSleepStrategy.scala:105)
>>> > at
>>> > org.apache.samza.util.ExponentialSleepStrategy.run(
>>> > ExponentialSleepStrategy.scala:91)
>>> > at
>>> > org.apache.samza.system.kafka.KafkaSystemProducer.send(
>>> > KafkaSystemProducer.scala:91)
>>> > at org.apache.samza.system.SystemProducers.send(SystemProducers
>>> .scala:87)
>>> > at
>>> > org.apache.samza.task.TaskInstanceCollector.send(
>>> > TaskInstanceCollector.scala:61)
>>> > at toolbox.analyzer2.realtime.CommonWriter.write(CommonWriter.java:50)
>>> > at toolbox.analyzer2.realtime.InitTask.lambda$process$0(InitTas
>>> k.java:110)
>>> > at toolbox.analyzer2.realtime.InitTask$$Lambda$4/938405008.emit
>>> (Unknown
>>> > Source)
>>> > at
>>> > toolbox.analyzer2.util.core.TransToKvProcessor.process(
>>> > TransToKvProcessor.java:146)
>>> > at toolbox.analyzer2.realtime.InitTask$2.emit(InitTask.java:119)
>>> > at toolbox.analyzer2.util.core.JsonExpander.expand(JsonExpander
>>> .java:47)
>>> > at toolbox.analyzer2.realtime.InitTask.process(InitTask.java:128)
>>> > at
>>> > org.apache.samza.container.TaskInstance$$anonfun$process$
>>> > 1.apply$mcV$sp(TaskInstance.scala:150)
>>> > at
>>> > org.apache.samza.container.TaskInstanceExceptionHandler.maybeHandle(
>>> > TaskInstanceExceptionHandler.scala:54)
>>> > at org.apache.samza.container.TaskInstance.process(TaskInstance
>>> .scala:149)
>>> > at
>>> > org.apache.samza.container.RunLoop$$anonfun$process$1$$
>>> > anonfun$apply$mcVJ$sp$2.apply(RunLoop.scala:122)
>>> > at
>>> > org.apache.samza.container.RunLoop$$anonfun$process$1$$
>>> > anonfun$apply$mcVJ$sp$2.apply(RunLoop.scala:119)
>>> > at scala.collection.immutable.List.foreach(List.scala:318)
>>> > at
>>> > org.apache.samza.container.RunLoop$$anonfun$process$1.
>>> > apply$mcVJ$sp(RunLoop.scala:118)
>>> > at
>>> > org.apache.samza.util.TimerUtils$class.updateTimerAndGetDuration(
>>> > TimerUtils.scala:51)
>>> > at
>>> > org.apache.samza.container.RunLoop.updateTimerAndGetDuration(
>>> > RunLoop.scala:35)
>>> > at org.apache.samza.container.RunLoop.process(RunLoop.scala:106)
>>> > at org.apache.samza.container.RunLoop.run(RunLoop.scala:74)
>>> > at org.apache.samza.container.SamzaContainer.run(SamzaContainer
>>> .scala:553)
>>>
>>> > at
>>> > org.apache.samza.container.SamzaContainer$.safeMain(
>>> > SamzaContainer.scala:92)
>>> > at org.apache.samza.container.SamzaContainer$.main(
>>> > SamzaContainer.scala:66)
>>> > at org.apache.samza.container.SamzaContainer.main(SamzaContaine
>>> r.scala)
>>> >
>>> > May be partition leader has changed in rush hour and metrics writing
>>> method
>>> > do not recognize that and retry again and again?
>>> >
>>> > Any response is appreciated :)
>>> >
>>> > On Sun, Aug 21, 2016 at 8:00 PM, 李斯宁 <lisin...@gmail.com> wrote:
>>> >
>>> > > at the last of the container's log, prints these:
>>> > >
>>> > > 2016-08-21 19:57:01 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>>> > Retrying send messsage due to RetriableException -
>>> org.apache.kafka.common.
>>> > errors.NotLeaderForPartitionException: This server is not the leader
>>> for
>>> > that topic-partition.. Turn on debugging to get a full stack trace
>>> > > 2016-08-21 19:57:11 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>>> > Retrying send messsage due to RetriableException -
>>> org.apache.kafka.common.
>>> > errors.NotLeaderForPartitionException: This server is not the leader
>>> for
>>> > that topic-partition.. Turn on debugging to get a full stack trace
>>> > > 2016-08-21 19:57:21 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>>> > Retrying send messsage due to RetriableException -
>>> org.apache.kafka.common.
>>> > errors.NotLeaderForPartitionException: This server is not the leader
>>> for
>>> > that topic-partition.. Turn on debugging to get a full stack trace
>>> > > 2016-08-21 19:57:31 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>>> > Retrying send messsage due to RetriableException -
>>> org.apache.kafka.common.
>>> > errors.NotLeaderForPartitionException: This server is not the leader
>>> for
>>> > that topic-partition.. Turn on debugging to get a full stack trace
>>> > > 2016-08-21 19:57:41 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>>> > Retrying send messsage due to RetriableException -
>>> org.apache.kafka.common.
>>> > errors.NotLeaderForPartitionException: This server is not the leader
>>> for
>>> > that topic-partition.. Turn on debugging to get a full stack trace
>>> > > 2016-08-21 19:57:51 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>>> > Retrying send messsage due to RetriableException -
>>> org.apache.kafka.common.
>>> > errors.NotLeaderForPartitionException: This server is not the leader
>>> for
>>> > that topic-partition.. Turn on debugging to get a full stack trace
>>> > > 2016-08-21 19:58:01 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>>> > Retrying send messsage due to RetriableException -
>>> org.apache.kafka.common.
>>> > errors.NotLeaderForPartitionException: This server is not the leader
>>> for
>>> > that topic-partition.. Turn on debugging to get a full stack trace
>>> > > 2016-08-21 19:58:11 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>>> > Retrying send messsage due to RetriableException -
>>> org.apache.kafka.common.
>>> > errors.NotLeaderForPartitionException: This server is not the leader
>>> for
>>> > that topic-partition.. Turn on debugging to get a full stack trace
>>> > > 2016-08-21 19:58:21 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>>> > Retrying send messsage due to RetriableException -
>>> org.apache.kafka.common.
>>> > errors.NotLeaderForPartitionException: This server is not the leader
>>> for
>>> > that topic-partition.. Turn on debugging to get a full stack trace
>>> > > 2016-08-21 19:58:31 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>>> > Retrying send messsage due to RetriableException -
>>> org.apache.kafka.common.
>>> > errors.NotLeaderForPartitionException: This server is not the leader
>>> for
>>> > that topic-partition.. Turn on debugging to get a full stack trace
>>> > > 2016-08-21 19:58:41 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>>> > Retrying send messsage due to RetriableException -
>>> org.apache.kafka.common.
>>> > errors.NotLeaderForPartitionException: This server is not the leader
>>> for
>>> > that topic-partition.. Turn on debugging to get a full stack trace
>>> > > 2016-08-21 19:58:51 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>>> > Retrying send messsage due to RetriableException -
>>> org.apache.kafka.common.
>>> > errors.NotLeaderForPartitionException: This server is not the leader
>>> for
>>> > that topic-partition.. Turn on debugging to get a full stack trace
>>> > > 2016-08-21 19:59:01 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
>>> > Retrying send messsage due to RetriableException -
>>> org.apache.kafka.common.
>>> > errors.NotLeaderForPartitionException: This server is not the leader
>>> for
>>> > that topic-partition.. Turn on debugging to get a full stack trace
>>> > >
>>> > >
>>> > > On Sun, Aug 21, 2016 at 7:38 PM, 李斯宁 <lisin...@gmail.com> wrote:
>>> > >
>>> > >> hi, guys
>>> > >> I'm using samza in realtime process. After running for about 10
>>> hours,
>>> > >> some containers paused and not processing.
>>> > >>
>>> > >> When I looked into the log, I found a lot of
>>> > >>
>>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
>>> > Got error produce response with correlation id 490345 on
>>> topic-partition
>>> > test3_a2_mobileDictClient_android_uid_imei-3, retrying (17 attempts
>>> > left). Error: NOT_LEADER_FOR_PARTITION
>>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
>>> > Got error produce response with correlation id 490345 on
>>> topic-partition
>>> > test3_a2_mobileDictClient_android_uid_imei-4, retrying (18 attempts
>>> > left). Error: NOT_LEADER_FOR_PARTITION
>>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
>>> > Got error produce response with correlation id 490345 on
>>> topic-partition
>>> > test3_a2_mobileDictClient_android_uid_imei-6, retrying (18 attempts
>>> > left). Error: NOT_LEADER_FOR_PARTITION
>>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
>>> > Got error produce response with correlation id 490346 on
>>> topic-partition
>>> > test3_a2_mobileDictClient_android_uid_imei-3, retrying (16 attempts
>>> > left). Error: NOT_LEADER_FOR_PARTITION
>>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
>>> > Got error produce response with correlation id 490346 on
>>> topic-partition
>>> > test3_a2_mobileDictClient_android_uid_imei-4, retrying (17 attempts
>>> > left). Error: NOT_LEADER_FOR_PARTITION
>>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257)
>>> > Got error produce response with correlation id 490346 on
>>> topic-partition
>>> > test3_a2_mobileDictClient_android_uid_imei-6, retrying (17 attempts
>>> > left). Error: NOT_LEADER_FOR_PARTITION
>>> > >>
>>> > >> ...
>>> > >>
>>> > >> 2016-08-21 10:49:01 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66
>>> )
>>> > Retrying send messsage due to RetriableException -
>>> org.apache.kafka.common.
>>> > errors.NotLeaderForPartitionException: This server is not the leader
>>> for
>>> > that topic-partition.. Turn on debugging to get a full stack trace
>>> > >> 2016-08-21 10:49:11 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66
>>> )
>>> > Retrying send messsage due to RetriableException -
>>> org.apache.kafka.common.
>>> > errors.NotLeaderForPartitionException: This server is not the leader
>>> for
>>> > that topic-partition.. Turn on debugging to get a full stack trace
>>> > >> 2016-08-21 10:49:21 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66
>>> )
>>> > Retrying send messsage due to RetriableException -
>>> org.apache.kafka.common.
>>> > errors.NotLeaderForPartitionException: This server is not the leader
>>> for
>>> > that topic-partition.. Turn on debugging to get a full stack trace
>>> > >> 2016-08-21 10:49:31 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66
>>> )
>>> > Retrying send messsage due to RetriableException -
>>> org.apache.kafka.common.
>>> > errors.NotLeaderForPartitionException: This server is not the leader
>>> for
>>> > that topic-partition.. Turn on debugging to get a full stack trace
>>> > >> 2
>>> > >>
>>> > >> This happens since "rush hour" for new messages produced to kafka.
>>> May
>>> > be this is a bug of kafka / samza?
>>> > >>
>>> > >> kafka version: 0.10.0.0
>>> > >>
>>> > >> kafka config and part of paused log are attached.
>>> > >>
>>> > >>
>>> > >>
>>> > >
>>> > >
>>> > > --
>>> > > 李斯宁
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> > 李斯宁
>>> >
>>>
>>
>>
>>
>> --
>> 李斯宁
>>
>
>
>
> --
> 李斯宁
>



-- 
Thanks and regards

Chinmay Soman

Reply via email to