I have tried restart every kafka server.  The container did not recover.

log have something below:

2016-08-21 20:08:21 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 )
Retrying send messsage due to RetriableException -
org.apache.kafka.common.errors.NotLeaderForPartitionException: This server
is not the leader for that topic-partition.. Turn on debugging to get a
full stack trace
2016-08-21 20:08:22 [WARN ](o.a.k.c.p.i.Sender                 :257) Got
error produce response with correlation id 4364 on topic-partition
samzaMetrics-5, retrying (0 attempts left). Error: NOT_LEADER_FOR_PARTITION
2016-08-21 20:08:23 [WARN ](o.a.k.c.p.i.Sender                 :257) Got
error produce response with correlation id 4367 on topic-partition
samzaMetrics-5, retrying (29 attempts left). Error: NOT_LEADER_FOR_PARTITION


jstack shows:

"main" #1 prio=5 os_prio=0 tid=0x00007f1ba401a000 nid=0x1a621 waiting on
condition [0x00007f1bab976000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at
org.apache.samza.util.ExponentialSleepStrategy$RetryLoopState.sleep(ExponentialSleepStrategy.scala:105)
at
org.apache.samza.util.ExponentialSleepStrategy.run(ExponentialSleepStrategy.scala:91)
at
org.apache.samza.system.kafka.KafkaSystemProducer.send(KafkaSystemProducer.scala:91)
at org.apache.samza.system.SystemProducers.send(SystemProducers.scala:87)
at
org.apache.samza.task.TaskInstanceCollector.send(TaskInstanceCollector.scala:61)
at toolbox.analyzer2.realtime.CommonWriter.write(CommonWriter.java:50)
at toolbox.analyzer2.realtime.InitTask.lambda$process$0(InitTask.java:110)
at toolbox.analyzer2.realtime.InitTask$$Lambda$4/938405008.emit(Unknown
Source)
at
toolbox.analyzer2.util.core.TransToKvProcessor.process(TransToKvProcessor.java:146)
at toolbox.analyzer2.realtime.InitTask$2.emit(InitTask.java:119)
at toolbox.analyzer2.util.core.JsonExpander.expand(JsonExpander.java:47)
at toolbox.analyzer2.realtime.InitTask.process(InitTask.java:128)
at
org.apache.samza.container.TaskInstance$$anonfun$process$1.apply$mcV$sp(TaskInstance.scala:150)
at
org.apache.samza.container.TaskInstanceExceptionHandler.maybeHandle(TaskInstanceExceptionHandler.scala:54)
at org.apache.samza.container.TaskInstance.process(TaskInstance.scala:149)
at
org.apache.samza.container.RunLoop$$anonfun$process$1$$anonfun$apply$mcVJ$sp$2.apply(RunLoop.scala:122)
at
org.apache.samza.container.RunLoop$$anonfun$process$1$$anonfun$apply$mcVJ$sp$2.apply(RunLoop.scala:119)
at scala.collection.immutable.List.foreach(List.scala:318)
at
org.apache.samza.container.RunLoop$$anonfun$process$1.apply$mcVJ$sp(RunLoop.scala:118)
at
org.apache.samza.util.TimerUtils$class.updateTimerAndGetDuration(TimerUtils.scala:51)
at
org.apache.samza.container.RunLoop.updateTimerAndGetDuration(RunLoop.scala:35)
at org.apache.samza.container.RunLoop.process(RunLoop.scala:106)
at org.apache.samza.container.RunLoop.run(RunLoop.scala:74)
at org.apache.samza.container.SamzaContainer.run(SamzaContainer.scala:553)
at
org.apache.samza.container.SamzaContainer$.safeMain(SamzaContainer.scala:92)
at org.apache.samza.container.SamzaContainer$.main(SamzaContainer.scala:66)
at org.apache.samza.container.SamzaContainer.main(SamzaContainer.scala)

May be partition leader has changed in rush hour and metrics writing method
do not recognize that and retry again and again?

Any response is appreciated :)

On Sun, Aug 21, 2016 at 8:00 PM, 李斯宁 <lisin...@gmail.com> wrote:

> at the last of the container's log, prints these:
>
> 2016-08-21 19:57:01 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) Retrying 
> send messsage due to RetriableException - 
> org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
> not the leader for that topic-partition.. Turn on debugging to get a full 
> stack trace
> 2016-08-21 19:57:11 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) Retrying 
> send messsage due to RetriableException - 
> org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
> not the leader for that topic-partition.. Turn on debugging to get a full 
> stack trace
> 2016-08-21 19:57:21 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) Retrying 
> send messsage due to RetriableException - 
> org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
> not the leader for that topic-partition.. Turn on debugging to get a full 
> stack trace
> 2016-08-21 19:57:31 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) Retrying 
> send messsage due to RetriableException - 
> org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
> not the leader for that topic-partition.. Turn on debugging to get a full 
> stack trace
> 2016-08-21 19:57:41 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) Retrying 
> send messsage due to RetriableException - 
> org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
> not the leader for that topic-partition.. Turn on debugging to get a full 
> stack trace
> 2016-08-21 19:57:51 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) Retrying 
> send messsage due to RetriableException - 
> org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
> not the leader for that topic-partition.. Turn on debugging to get a full 
> stack trace
> 2016-08-21 19:58:01 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) Retrying 
> send messsage due to RetriableException - 
> org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
> not the leader for that topic-partition.. Turn on debugging to get a full 
> stack trace
> 2016-08-21 19:58:11 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) Retrying 
> send messsage due to RetriableException - 
> org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
> not the leader for that topic-partition.. Turn on debugging to get a full 
> stack trace
> 2016-08-21 19:58:21 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) Retrying 
> send messsage due to RetriableException - 
> org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
> not the leader for that topic-partition.. Turn on debugging to get a full 
> stack trace
> 2016-08-21 19:58:31 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) Retrying 
> send messsage due to RetriableException - 
> org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
> not the leader for that topic-partition.. Turn on debugging to get a full 
> stack trace
> 2016-08-21 19:58:41 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) Retrying 
> send messsage due to RetriableException - 
> org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
> not the leader for that topic-partition.. Turn on debugging to get a full 
> stack trace
> 2016-08-21 19:58:51 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) Retrying 
> send messsage due to RetriableException - 
> org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
> not the leader for that topic-partition.. Turn on debugging to get a full 
> stack trace
> 2016-08-21 19:59:01 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) Retrying 
> send messsage due to RetriableException - 
> org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
> not the leader for that topic-partition.. Turn on debugging to get a full 
> stack trace
>
>
> On Sun, Aug 21, 2016 at 7:38 PM, 李斯宁 <lisin...@gmail.com> wrote:
>
>> hi, guys
>> I'm using samza in realtime process. After running for about 10 hours,
>> some containers paused and not processing.
>>
>> When I looked into the log, I found a lot of
>>
>> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257) Got 
>> error produce response with correlation id 490345 on topic-partition 
>> test3_a2_mobileDictClient_android_uid_imei-3, retrying (17 attempts left). 
>> Error: NOT_LEADER_FOR_PARTITION
>> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257) Got 
>> error produce response with correlation id 490345 on topic-partition 
>> test3_a2_mobileDictClient_android_uid_imei-4, retrying (18 attempts left). 
>> Error: NOT_LEADER_FOR_PARTITION
>> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257) Got 
>> error produce response with correlation id 490345 on topic-partition 
>> test3_a2_mobileDictClient_android_uid_imei-6, retrying (18 attempts left). 
>> Error: NOT_LEADER_FOR_PARTITION
>> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257) Got 
>> error produce response with correlation id 490346 on topic-partition 
>> test3_a2_mobileDictClient_android_uid_imei-3, retrying (16 attempts left). 
>> Error: NOT_LEADER_FOR_PARTITION
>> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257) Got 
>> error produce response with correlation id 490346 on topic-partition 
>> test3_a2_mobileDictClient_android_uid_imei-4, retrying (17 attempts left). 
>> Error: NOT_LEADER_FOR_PARTITION
>> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender                 :257) Got 
>> error produce response with correlation id 490346 on topic-partition 
>> test3_a2_mobileDictClient_android_uid_imei-6, retrying (17 attempts left). 
>> Error: NOT_LEADER_FOR_PARTITION
>>
>> ...
>>
>> 2016-08-21 10:49:01 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) 
>> Retrying send messsage due to RetriableException - 
>> org.apache.kafka.common.errors.NotLeaderForPartitionException: This server 
>> is not the leader for that topic-partition.. Turn on debugging to get a full 
>> stack trace
>> 2016-08-21 10:49:11 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) 
>> Retrying send messsage due to RetriableException - 
>> org.apache.kafka.common.errors.NotLeaderForPartitionException: This server 
>> is not the leader for that topic-partition.. Turn on debugging to get a full 
>> stack trace
>> 2016-08-21 10:49:21 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) 
>> Retrying send messsage due to RetriableException - 
>> org.apache.kafka.common.errors.NotLeaderForPartitionException: This server 
>> is not the leader for that topic-partition.. Turn on debugging to get a full 
>> stack trace
>> 2016-08-21 10:49:31 [WARN ](o.a.s.s.k.KafkaSystemProducer      :66 ) 
>> Retrying send messsage due to RetriableException - 
>> org.apache.kafka.common.errors.NotLeaderForPartitionException: This server 
>> is not the leader for that topic-partition.. Turn on debugging to get a full 
>> stack trace
>> 2
>>
>> This happens since "rush hour" for new messages produced to kafka. May be 
>> this is a bug of kafka / samza?
>>
>> kafka version: 0.10.0.0
>>
>> kafka config and part of paused log are attached.
>>
>>
>>
>
>
> --
> 李斯宁
>



-- 
李斯宁

Reply via email to