Sorry dont see anything in the attachment. Can you please re-attach and re-send ?
On Wed, Aug 31, 2016 at 3:27 AM, 李斯宁 <lisin...@gmail.com> wrote: > It seems upgrading does not solve the problem. All task hang in today's > "rush hour". > I attached log and jstack. > > The SAMZA-911 want to fix by stopping the process if failed too much > times. But the process is still there and hanging. > > On Mon, Aug 22, 2016 at 1:14 PM, 李斯宁 <lisin...@gmail.com> wrote: > >> Thanks so much, I'll try. >> >> On Mon, Aug 22, 2016 at 6:26 AM, Yi Pan <nickpa...@gmail.com> wrote: >> >>> Hi, Sining, >>> >>> This is a known bug that is fixed in 0.10.1 (SAMZA-911). Please try to >>> upgrade to 0.10.1. >>> >>> Thanks! >>> >>> -Yi >>> >>> On Sun, Aug 21, 2016 at 5:55 AM, 李斯宁 <lisin...@gmail.com> wrote: >>> >>> > I have tried restart every kafka server. The container did not >>> recover. >>> > >>> > log have something below: >>> > >>> > 2016-08-21 20:08:21 [WARN ](o.a.s.s.k.KafkaSystemProducer :66 ) >>> > Retrying send messsage due to RetriableException - >>> > org.apache.kafka.common.errors.NotLeaderForPartitionException: This >>> server >>> > is not the leader for that topic-partition.. Turn on debugging to get a >>> > full stack trace >>> > 2016-08-21 20:08:22 [WARN ](o.a.k.c.p.i.Sender :257) >>> Got >>> > error produce response with correlation id 4364 on topic-partition >>> > samzaMetrics-5, retrying (0 attempts left). Error: >>> NOT_LEADER_FOR_PARTITION >>> > 2016-08-21 20:08:23 [WARN ](o.a.k.c.p.i.Sender :257) >>> Got >>> > error produce response with correlation id 4367 on topic-partition >>> > samzaMetrics-5, retrying (29 attempts left). Error: >>> > NOT_LEADER_FOR_PARTITION >>> > >>> > >>> > jstack shows: >>> > >>> > "main" #1 prio=5 os_prio=0 tid=0x00007f1ba401a000 nid=0x1a621 waiting >>> on >>> > condition [0x00007f1bab976000] >>> > java.lang.Thread.State: TIMED_WAITING (sleeping) >>> > at java.lang.Thread.sleep(Native Method) >>> > at >>> > org.apache.samza.util.ExponentialSleepStrategy$RetryLoopState.sleep( >>> > ExponentialSleepStrategy.scala:105) >>> > at >>> > org.apache.samza.util.ExponentialSleepStrategy.run( >>> > ExponentialSleepStrategy.scala:91) >>> > at >>> > org.apache.samza.system.kafka.KafkaSystemProducer.send( >>> > KafkaSystemProducer.scala:91) >>> > at org.apache.samza.system.SystemProducers.send(SystemProducers >>> .scala:87) >>> > at >>> > org.apache.samza.task.TaskInstanceCollector.send( >>> > TaskInstanceCollector.scala:61) >>> > at toolbox.analyzer2.realtime.CommonWriter.write(CommonWriter.java:50) >>> > at toolbox.analyzer2.realtime.InitTask.lambda$process$0(InitTas >>> k.java:110) >>> > at toolbox.analyzer2.realtime.InitTask$$Lambda$4/938405008.emit >>> (Unknown >>> > Source) >>> > at >>> > toolbox.analyzer2.util.core.TransToKvProcessor.process( >>> > TransToKvProcessor.java:146) >>> > at toolbox.analyzer2.realtime.InitTask$2.emit(InitTask.java:119) >>> > at toolbox.analyzer2.util.core.JsonExpander.expand(JsonExpander >>> .java:47) >>> > at toolbox.analyzer2.realtime.InitTask.process(InitTask.java:128) >>> > at >>> > org.apache.samza.container.TaskInstance$$anonfun$process$ >>> > 1.apply$mcV$sp(TaskInstance.scala:150) >>> > at >>> > org.apache.samza.container.TaskInstanceExceptionHandler.maybeHandle( >>> > TaskInstanceExceptionHandler.scala:54) >>> > at org.apache.samza.container.TaskInstance.process(TaskInstance >>> .scala:149) >>> > at >>> > org.apache.samza.container.RunLoop$$anonfun$process$1$$ >>> > anonfun$apply$mcVJ$sp$2.apply(RunLoop.scala:122) >>> > at >>> > org.apache.samza.container.RunLoop$$anonfun$process$1$$ >>> > anonfun$apply$mcVJ$sp$2.apply(RunLoop.scala:119) >>> > at scala.collection.immutable.List.foreach(List.scala:318) >>> > at >>> > org.apache.samza.container.RunLoop$$anonfun$process$1. >>> > apply$mcVJ$sp(RunLoop.scala:118) >>> > at >>> > org.apache.samza.util.TimerUtils$class.updateTimerAndGetDuration( >>> > TimerUtils.scala:51) >>> > at >>> > org.apache.samza.container.RunLoop.updateTimerAndGetDuration( >>> > RunLoop.scala:35) >>> > at org.apache.samza.container.RunLoop.process(RunLoop.scala:106) >>> > at org.apache.samza.container.RunLoop.run(RunLoop.scala:74) >>> > at org.apache.samza.container.SamzaContainer.run(SamzaContainer >>> .scala:553) >>> >>> > at >>> > org.apache.samza.container.SamzaContainer$.safeMain( >>> > SamzaContainer.scala:92) >>> > at org.apache.samza.container.SamzaContainer$.main( >>> > SamzaContainer.scala:66) >>> > at org.apache.samza.container.SamzaContainer.main(SamzaContaine >>> r.scala) >>> > >>> > May be partition leader has changed in rush hour and metrics writing >>> method >>> > do not recognize that and retry again and again? >>> > >>> > Any response is appreciated :) >>> > >>> > On Sun, Aug 21, 2016 at 8:00 PM, 李斯宁 <lisin...@gmail.com> wrote: >>> > >>> > > at the last of the container's log, prints these: >>> > > >>> > > 2016-08-21 19:57:01 [WARN ](o.a.s.s.k.KafkaSystemProducer :66 ) >>> > Retrying send messsage due to RetriableException - >>> org.apache.kafka.common. >>> > errors.NotLeaderForPartitionException: This server is not the leader >>> for >>> > that topic-partition.. Turn on debugging to get a full stack trace >>> > > 2016-08-21 19:57:11 [WARN ](o.a.s.s.k.KafkaSystemProducer :66 ) >>> > Retrying send messsage due to RetriableException - >>> org.apache.kafka.common. >>> > errors.NotLeaderForPartitionException: This server is not the leader >>> for >>> > that topic-partition.. Turn on debugging to get a full stack trace >>> > > 2016-08-21 19:57:21 [WARN ](o.a.s.s.k.KafkaSystemProducer :66 ) >>> > Retrying send messsage due to RetriableException - >>> org.apache.kafka.common. >>> > errors.NotLeaderForPartitionException: This server is not the leader >>> for >>> > that topic-partition.. Turn on debugging to get a full stack trace >>> > > 2016-08-21 19:57:31 [WARN ](o.a.s.s.k.KafkaSystemProducer :66 ) >>> > Retrying send messsage due to RetriableException - >>> org.apache.kafka.common. >>> > errors.NotLeaderForPartitionException: This server is not the leader >>> for >>> > that topic-partition.. Turn on debugging to get a full stack trace >>> > > 2016-08-21 19:57:41 [WARN ](o.a.s.s.k.KafkaSystemProducer :66 ) >>> > Retrying send messsage due to RetriableException - >>> org.apache.kafka.common. >>> > errors.NotLeaderForPartitionException: This server is not the leader >>> for >>> > that topic-partition.. Turn on debugging to get a full stack trace >>> > > 2016-08-21 19:57:51 [WARN ](o.a.s.s.k.KafkaSystemProducer :66 ) >>> > Retrying send messsage due to RetriableException - >>> org.apache.kafka.common. >>> > errors.NotLeaderForPartitionException: This server is not the leader >>> for >>> > that topic-partition.. Turn on debugging to get a full stack trace >>> > > 2016-08-21 19:58:01 [WARN ](o.a.s.s.k.KafkaSystemProducer :66 ) >>> > Retrying send messsage due to RetriableException - >>> org.apache.kafka.common. >>> > errors.NotLeaderForPartitionException: This server is not the leader >>> for >>> > that topic-partition.. Turn on debugging to get a full stack trace >>> > > 2016-08-21 19:58:11 [WARN ](o.a.s.s.k.KafkaSystemProducer :66 ) >>> > Retrying send messsage due to RetriableException - >>> org.apache.kafka.common. >>> > errors.NotLeaderForPartitionException: This server is not the leader >>> for >>> > that topic-partition.. Turn on debugging to get a full stack trace >>> > > 2016-08-21 19:58:21 [WARN ](o.a.s.s.k.KafkaSystemProducer :66 ) >>> > Retrying send messsage due to RetriableException - >>> org.apache.kafka.common. >>> > errors.NotLeaderForPartitionException: This server is not the leader >>> for >>> > that topic-partition.. Turn on debugging to get a full stack trace >>> > > 2016-08-21 19:58:31 [WARN ](o.a.s.s.k.KafkaSystemProducer :66 ) >>> > Retrying send messsage due to RetriableException - >>> org.apache.kafka.common. >>> > errors.NotLeaderForPartitionException: This server is not the leader >>> for >>> > that topic-partition.. Turn on debugging to get a full stack trace >>> > > 2016-08-21 19:58:41 [WARN ](o.a.s.s.k.KafkaSystemProducer :66 ) >>> > Retrying send messsage due to RetriableException - >>> org.apache.kafka.common. >>> > errors.NotLeaderForPartitionException: This server is not the leader >>> for >>> > that topic-partition.. Turn on debugging to get a full stack trace >>> > > 2016-08-21 19:58:51 [WARN ](o.a.s.s.k.KafkaSystemProducer :66 ) >>> > Retrying send messsage due to RetriableException - >>> org.apache.kafka.common. >>> > errors.NotLeaderForPartitionException: This server is not the leader >>> for >>> > that topic-partition.. Turn on debugging to get a full stack trace >>> > > 2016-08-21 19:59:01 [WARN ](o.a.s.s.k.KafkaSystemProducer :66 ) >>> > Retrying send messsage due to RetriableException - >>> org.apache.kafka.common. >>> > errors.NotLeaderForPartitionException: This server is not the leader >>> for >>> > that topic-partition.. Turn on debugging to get a full stack trace >>> > > >>> > > >>> > > On Sun, Aug 21, 2016 at 7:38 PM, 李斯宁 <lisin...@gmail.com> wrote: >>> > > >>> > >> hi, guys >>> > >> I'm using samza in realtime process. After running for about 10 >>> hours, >>> > >> some containers paused and not processing. >>> > >> >>> > >> When I looked into the log, I found a lot of >>> > >> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender :257) >>> > Got error produce response with correlation id 490345 on >>> topic-partition >>> > test3_a2_mobileDictClient_android_uid_imei-3, retrying (17 attempts >>> > left). Error: NOT_LEADER_FOR_PARTITION >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender :257) >>> > Got error produce response with correlation id 490345 on >>> topic-partition >>> > test3_a2_mobileDictClient_android_uid_imei-4, retrying (18 attempts >>> > left). Error: NOT_LEADER_FOR_PARTITION >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender :257) >>> > Got error produce response with correlation id 490345 on >>> topic-partition >>> > test3_a2_mobileDictClient_android_uid_imei-6, retrying (18 attempts >>> > left). Error: NOT_LEADER_FOR_PARTITION >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender :257) >>> > Got error produce response with correlation id 490346 on >>> topic-partition >>> > test3_a2_mobileDictClient_android_uid_imei-3, retrying (16 attempts >>> > left). Error: NOT_LEADER_FOR_PARTITION >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender :257) >>> > Got error produce response with correlation id 490346 on >>> topic-partition >>> > test3_a2_mobileDictClient_android_uid_imei-4, retrying (17 attempts >>> > left). Error: NOT_LEADER_FOR_PARTITION >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender :257) >>> > Got error produce response with correlation id 490346 on >>> topic-partition >>> > test3_a2_mobileDictClient_android_uid_imei-6, retrying (17 attempts >>> > left). Error: NOT_LEADER_FOR_PARTITION >>> > >> >>> > >> ... >>> > >> >>> > >> 2016-08-21 10:49:01 [WARN ](o.a.s.s.k.KafkaSystemProducer :66 >>> ) >>> > Retrying send messsage due to RetriableException - >>> org.apache.kafka.common. >>> > errors.NotLeaderForPartitionException: This server is not the leader >>> for >>> > that topic-partition.. Turn on debugging to get a full stack trace >>> > >> 2016-08-21 10:49:11 [WARN ](o.a.s.s.k.KafkaSystemProducer :66 >>> ) >>> > Retrying send messsage due to RetriableException - >>> org.apache.kafka.common. >>> > errors.NotLeaderForPartitionException: This server is not the leader >>> for >>> > that topic-partition.. Turn on debugging to get a full stack trace >>> > >> 2016-08-21 10:49:21 [WARN ](o.a.s.s.k.KafkaSystemProducer :66 >>> ) >>> > Retrying send messsage due to RetriableException - >>> org.apache.kafka.common. >>> > errors.NotLeaderForPartitionException: This server is not the leader >>> for >>> > that topic-partition.. Turn on debugging to get a full stack trace >>> > >> 2016-08-21 10:49:31 [WARN ](o.a.s.s.k.KafkaSystemProducer :66 >>> ) >>> > Retrying send messsage due to RetriableException - >>> org.apache.kafka.common. >>> > errors.NotLeaderForPartitionException: This server is not the leader >>> for >>> > that topic-partition.. Turn on debugging to get a full stack trace >>> > >> 2 >>> > >> >>> > >> This happens since "rush hour" for new messages produced to kafka. >>> May >>> > be this is a bug of kafka / samza? >>> > >> >>> > >> kafka version: 0.10.0.0 >>> > >> >>> > >> kafka config and part of paused log are attached. >>> > >> >>> > >> >>> > >> >>> > > >>> > > >>> > > -- >>> > > 李斯宁 >>> > > >>> > >>> > >>> > >>> > -- >>> > 李斯宁 >>> > >>> >> >> >> >> -- >> 李斯宁 >> > > > > -- > 李斯宁 > -- Thanks and regards Chinmay Soman