yes, upgraded to 0.10.1 jstack: https://drive.google.com/open?id=0B19olQZ1dUO8VjltQmtxLTJ4SVdFZWhYWHZ3Y2hMOVhCMWNn task log: https://drive.google.com/open?id=0B19olQZ1dUO8eVRLWmJCVl9nRlg2UUM4c21udUViWW8tSUVV
On Fri, Sep 2, 2016 at 4:41 PM, Yi Pan <nickpa...@gmail.com> wrote: > Hi, Sining, > > You note is on a site that I don't have account/access and it requires > sign-up. Can you share it via google doc, since you have a gmail account? > And just to confirm, you have upgrade and using 0.10.1 now, right? > > Thanks and apologize for the delay. > > -Yi > > On Fri, Sep 2, 2016 at 1:03 AM, 李斯宁 <lisin...@gmail.com> wrote: > > > Can any one help on this? Thanks! > > > > On Thu, Sep 1, 2016 at 11:59 AM, 李斯宁 <lisin...@gmail.com> wrote: > > > > > If you cannot see the attachment, please try http://note.youdao.com/ > > > noteshare?id=56b826c24af47a9fdb600490ce788710 > > > > > > On Thu, Sep 1, 2016 at 1:50 AM, Chinmay Soman < > chinmay.cere...@gmail.com > > > > > > wrote: > > > > > >> Sorry dont see anything in the attachment. Can you please re-attach > and > > >> re-send ? > > >> > > >> On Wed, Aug 31, 2016 at 3:27 AM, 李斯宁 <lisin...@gmail.com> wrote: > > >> > > >> > It seems upgrading does not solve the problem. All task hang in > > today's > > >> > "rush hour". > > >> > I attached log and jstack. > > >> > > > >> > The SAMZA-911 want to fix by stopping the process if failed too much > > >> > times. But the process is still there and hanging. > > >> > > > >> > On Mon, Aug 22, 2016 at 1:14 PM, 李斯宁 <lisin...@gmail.com> wrote: > > >> > > > >> >> Thanks so much, I'll try. > > >> >> > > >> >> On Mon, Aug 22, 2016 at 6:26 AM, Yi Pan <nickpa...@gmail.com> > wrote: > > >> >> > > >> >>> Hi, Sining, > > >> >>> > > >> >>> This is a known bug that is fixed in 0.10.1 (SAMZA-911). Please > try > > to > > >> >>> upgrade to 0.10.1. > > >> >>> > > >> >>> Thanks! > > >> >>> > > >> >>> -Yi > > >> >>> > > >> >>> On Sun, Aug 21, 2016 at 5:55 AM, 李斯宁 <lisin...@gmail.com> wrote: > > >> >>> > > >> >>> > I have tried restart every kafka server. The container did not > > >> >>> recover. > > >> >>> > > > >> >>> > log have something below: > > >> >>> > > > >> >>> > 2016-08-21 20:08:21 [WARN ](o.a.s.s.k.KafkaSystemProducer > > :66 > > >> ) > > >> >>> > Retrying send messsage due to RetriableException - > > >> >>> > org.apache.kafka.common.errors.NotLeaderForPartitionException: > > This > > >> >>> server > > >> >>> > is not the leader for that topic-partition.. Turn on debugging > to > > >> get a > > >> >>> > full stack trace > > >> >>> > 2016-08-21 20:08:22 [WARN ](o.a.k.c.p.i.Sender > > :257) > > >> >>> Got > > >> >>> > error produce response with correlation id 4364 on > topic-partition > > >> >>> > samzaMetrics-5, retrying (0 attempts left). Error: > > >> >>> NOT_LEADER_FOR_PARTITION > > >> >>> > 2016-08-21 20:08:23 [WARN ](o.a.k.c.p.i.Sender > > :257) > > >> >>> Got > > >> >>> > error produce response with correlation id 4367 on > topic-partition > > >> >>> > samzaMetrics-5, retrying (29 attempts left). Error: > > >> >>> > NOT_LEADER_FOR_PARTITION > > >> >>> > > > >> >>> > > > >> >>> > jstack shows: > > >> >>> > > > >> >>> > "main" #1 prio=5 os_prio=0 tid=0x00007f1ba401a000 nid=0x1a621 > > >> waiting > > >> >>> on > > >> >>> > condition [0x00007f1bab976000] > > >> >>> > java.lang.Thread.State: TIMED_WAITING (sleeping) > > >> >>> > at java.lang.Thread.sleep(Native Method) > > >> >>> > at > > >> >>> > org.apache.samza.util.ExponentialSleepStrategy$RetryLoopStat > > >> e.sleep( > > >> >>> > ExponentialSleepStrategy.scala:105) > > >> >>> > at > > >> >>> > org.apache.samza.util.ExponentialSleepStrategy.run( > > >> >>> > ExponentialSleepStrategy.scala:91) > > >> >>> > at > > >> >>> > org.apache.samza.system.kafka.KafkaSystemProducer.send( > > >> >>> > KafkaSystemProducer.scala:91) > > >> >>> > at org.apache.samza.system.SystemProducers.send(SystemProducers > > >> >>> .scala:87) > > >> >>> > at > > >> >>> > org.apache.samza.task.TaskInstanceCollector.send( > > >> >>> > TaskInstanceCollector.scala:61) > > >> >>> > at toolbox.analyzer2.realtime.CommonWriter.write(CommonWriter. > > >> java:50) > > >> >>> > at toolbox.analyzer2.realtime.InitTask.lambda$process$0(InitTas > > >> >>> k.java:110) > > >> >>> > at toolbox.analyzer2.realtime.InitTask$$Lambda$4/938405008.emit > > >> >>> (Unknown > > >> >>> > Source) > > >> >>> > at > > >> >>> > toolbox.analyzer2.util.core.TransToKvProcessor.process( > > >> >>> > TransToKvProcessor.java:146) > > >> >>> > at toolbox.analyzer2.realtime.InitTask$2.emit(InitTask.java: > 119) > > >> >>> > at toolbox.analyzer2.util.core.JsonExpander.expand(JsonExpander > > >> >>> .java:47) > > >> >>> > at toolbox.analyzer2.realtime.InitTask.process(InitTask. > java:128) > > >> >>> > at > > >> >>> > org.apache.samza.container.TaskInstance$$anonfun$process$ > > >> >>> > 1.apply$mcV$sp(TaskInstance.scala:150) > > >> >>> > at > > >> >>> > org.apache.samza.container.TaskInstanceExceptionHandler.mayb > > >> eHandle( > > >> >>> > TaskInstanceExceptionHandler.scala:54) > > >> >>> > at org.apache.samza.container.TaskInstance.process(TaskInstance > > >> >>> .scala:149) > > >> >>> > at > > >> >>> > org.apache.samza.container.RunLoop$$anonfun$process$1$$ > > >> >>> > anonfun$apply$mcVJ$sp$2.apply(RunLoop.scala:122) > > >> >>> > at > > >> >>> > org.apache.samza.container.RunLoop$$anonfun$process$1$$ > > >> >>> > anonfun$apply$mcVJ$sp$2.apply(RunLoop.scala:119) > > >> >>> > at scala.collection.immutable.List.foreach(List.scala:318) > > >> >>> > at > > >> >>> > org.apache.samza.container.RunLoop$$anonfun$process$1. > > >> >>> > apply$mcVJ$sp(RunLoop.scala:118) > > >> >>> > at > > >> >>> > org.apache.samza.util.TimerUtils$class. > updateTimerAndGetDuration( > > >> >>> > TimerUtils.scala:51) > > >> >>> > at > > >> >>> > org.apache.samza.container.RunLoop.updateTimerAndGetDuration( > > >> >>> > RunLoop.scala:35) > > >> >>> > at org.apache.samza.container.RunLoop.process(RunLoop.scala: > 106) > > >> >>> > at org.apache.samza.container.RunLoop.run(RunLoop.scala:74) > > >> >>> > at org.apache.samza.container.SamzaContainer.run(SamzaContainer > > >> >>> .scala:553) > > >> >>> > > >> >>> > at > > >> >>> > org.apache.samza.container.SamzaContainer$.safeMain( > > >> >>> > SamzaContainer.scala:92) > > >> >>> > at org.apache.samza.container.SamzaContainer$.main( > > >> >>> > SamzaContainer.scala:66) > > >> >>> > at org.apache.samza.container.SamzaContainer.main(SamzaContaine > > >> >>> r.scala) > > >> >>> > > > >> >>> > May be partition leader has changed in rush hour and metrics > > writing > > >> >>> method > > >> >>> > do not recognize that and retry again and again? > > >> >>> > > > >> >>> > Any response is appreciated :) > > >> >>> > > > >> >>> > On Sun, Aug 21, 2016 at 8:00 PM, 李斯宁 <lisin...@gmail.com> > wrote: > > >> >>> > > > >> >>> > > at the last of the container's log, prints these: > > >> >>> > > > > >> >>> > > 2016-08-21 19:57:01 [WARN ](o.a.s.s.k.KafkaSystemProducer > > >> :66 ) > > >> >>> > Retrying send messsage due to RetriableException - > > >> >>> org.apache.kafka.common. > > >> >>> > errors.NotLeaderForPartitionException: This server is not the > > >> leader > > >> >>> for > > >> >>> > that topic-partition.. Turn on debugging to get a full stack > trace > > >> >>> > > 2016-08-21 19:57:11 [WARN ](o.a.s.s.k.KafkaSystemProducer > > >> :66 ) > > >> >>> > Retrying send messsage due to RetriableException - > > >> >>> org.apache.kafka.common. > > >> >>> > errors.NotLeaderForPartitionException: This server is not the > > >> leader > > >> >>> for > > >> >>> > that topic-partition.. Turn on debugging to get a full stack > trace > > >> >>> > > 2016-08-21 19:57:21 [WARN ](o.a.s.s.k.KafkaSystemProducer > > >> :66 ) > > >> >>> > Retrying send messsage due to RetriableException - > > >> >>> org.apache.kafka.common. > > >> >>> > errors.NotLeaderForPartitionException: This server is not the > > >> leader > > >> >>> for > > >> >>> > that topic-partition.. Turn on debugging to get a full stack > trace > > >> >>> > > 2016-08-21 19:57:31 [WARN ](o.a.s.s.k.KafkaSystemProducer > > >> :66 ) > > >> >>> > Retrying send messsage due to RetriableException - > > >> >>> org.apache.kafka.common. > > >> >>> > errors.NotLeaderForPartitionException: This server is not the > > >> leader > > >> >>> for > > >> >>> > that topic-partition.. Turn on debugging to get a full stack > trace > > >> >>> > > 2016-08-21 19:57:41 [WARN ](o.a.s.s.k.KafkaSystemProducer > > >> :66 ) > > >> >>> > Retrying send messsage due to RetriableException - > > >> >>> org.apache.kafka.common. > > >> >>> > errors.NotLeaderForPartitionException: This server is not the > > >> leader > > >> >>> for > > >> >>> > that topic-partition.. Turn on debugging to get a full stack > trace > > >> >>> > > 2016-08-21 19:57:51 [WARN ](o.a.s.s.k.KafkaSystemProducer > > >> :66 ) > > >> >>> > Retrying send messsage due to RetriableException - > > >> >>> org.apache.kafka.common. > > >> >>> > errors.NotLeaderForPartitionException: This server is not the > > >> leader > > >> >>> for > > >> >>> > that topic-partition.. Turn on debugging to get a full stack > trace > > >> >>> > > 2016-08-21 19:58:01 [WARN ](o.a.s.s.k.KafkaSystemProducer > > >> :66 ) > > >> >>> > Retrying send messsage due to RetriableException - > > >> >>> org.apache.kafka.common. > > >> >>> > errors.NotLeaderForPartitionException: This server is not the > > >> leader > > >> >>> for > > >> >>> > that topic-partition.. Turn on debugging to get a full stack > trace > > >> >>> > > 2016-08-21 19:58:11 [WARN ](o.a.s.s.k.KafkaSystemProducer > > >> :66 ) > > >> >>> > Retrying send messsage due to RetriableException - > > >> >>> org.apache.kafka.common. > > >> >>> > errors.NotLeaderForPartitionException: This server is not the > > >> leader > > >> >>> for > > >> >>> > that topic-partition.. Turn on debugging to get a full stack > trace > > >> >>> > > 2016-08-21 19:58:21 [WARN ](o.a.s.s.k.KafkaSystemProducer > > >> :66 ) > > >> >>> > Retrying send messsage due to RetriableException - > > >> >>> org.apache.kafka.common. > > >> >>> > errors.NotLeaderForPartitionException: This server is not the > > >> leader > > >> >>> for > > >> >>> > that topic-partition.. Turn on debugging to get a full stack > trace > > >> >>> > > 2016-08-21 19:58:31 [WARN ](o.a.s.s.k.KafkaSystemProducer > > >> :66 ) > > >> >>> > Retrying send messsage due to RetriableException - > > >> >>> org.apache.kafka.common. > > >> >>> > errors.NotLeaderForPartitionException: This server is not the > > >> leader > > >> >>> for > > >> >>> > that topic-partition.. Turn on debugging to get a full stack > trace > > >> >>> > > 2016-08-21 19:58:41 [WARN ](o.a.s.s.k.KafkaSystemProducer > > >> :66 ) > > >> >>> > Retrying send messsage due to RetriableException - > > >> >>> org.apache.kafka.common. > > >> >>> > errors.NotLeaderForPartitionException: This server is not the > > >> leader > > >> >>> for > > >> >>> > that topic-partition.. Turn on debugging to get a full stack > trace > > >> >>> > > 2016-08-21 19:58:51 [WARN ](o.a.s.s.k.KafkaSystemProducer > > >> :66 ) > > >> >>> > Retrying send messsage due to RetriableException - > > >> >>> org.apache.kafka.common. > > >> >>> > errors.NotLeaderForPartitionException: This server is not the > > >> leader > > >> >>> for > > >> >>> > that topic-partition.. Turn on debugging to get a full stack > trace > > >> >>> > > 2016-08-21 19:59:01 [WARN ](o.a.s.s.k.KafkaSystemProducer > > >> :66 ) > > >> >>> > Retrying send messsage due to RetriableException - > > >> >>> org.apache.kafka.common. > > >> >>> > errors.NotLeaderForPartitionException: This server is not the > > >> leader > > >> >>> for > > >> >>> > that topic-partition.. Turn on debugging to get a full stack > trace > > >> >>> > > > > >> >>> > > > > >> >>> > > On Sun, Aug 21, 2016 at 7:38 PM, 李斯宁 <lisin...@gmail.com> > > wrote: > > >> >>> > > > > >> >>> > >> hi, guys > > >> >>> > >> I'm using samza in realtime process. After running for about > 10 > > >> >>> hours, > > >> >>> > >> some containers paused and not processing. > > >> >>> > >> > > >> >>> > >> When I looked into the log, I found a lot of > > >> >>> > >> > > >> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender > > >> :257) > > >> >>> > Got error produce response with correlation id 490345 on > > >> >>> topic-partition > > >> >>> > test3_a2_mobileDictClient_android_uid_imei-3, retrying (17 > > attempts > > >> >>> > left). Error: NOT_LEADER_FOR_PARTITION > > >> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender > > >> :257) > > >> >>> > Got error produce response with correlation id 490345 on > > >> >>> topic-partition > > >> >>> > test3_a2_mobileDictClient_android_uid_imei-4, retrying (18 > > attempts > > >> >>> > left). Error: NOT_LEADER_FOR_PARTITION > > >> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender > > >> :257) > > >> >>> > Got error produce response with correlation id 490345 on > > >> >>> topic-partition > > >> >>> > test3_a2_mobileDictClient_android_uid_imei-6, retrying (18 > > attempts > > >> >>> > left). Error: NOT_LEADER_FOR_PARTITION > > >> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender > > >> :257) > > >> >>> > Got error produce response with correlation id 490346 on > > >> >>> topic-partition > > >> >>> > test3_a2_mobileDictClient_android_uid_imei-3, retrying (16 > > attempts > > >> >>> > left). Error: NOT_LEADER_FOR_PARTITION > > >> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender > > >> :257) > > >> >>> > Got error produce response with correlation id 490346 on > > >> >>> topic-partition > > >> >>> > test3_a2_mobileDictClient_android_uid_imei-4, retrying (17 > > attempts > > >> >>> > left). Error: NOT_LEADER_FOR_PARTITION > > >> >>> > >> 2016-08-21 10:03:07 [WARN ](o.a.k.c.p.i.Sender > > >> :257) > > >> >>> > Got error produce response with correlation id 490346 on > > >> >>> topic-partition > > >> >>> > test3_a2_mobileDictClient_android_uid_imei-6, retrying (17 > > attempts > > >> >>> > left). Error: NOT_LEADER_FOR_PARTITION > > >> >>> > >> > > >> >>> > >> ... > > >> >>> > >> > > >> >>> > >> 2016-08-21 10:49:01 [WARN ](o.a.s.s.k.KafkaSystemProducer > > >> :66 > > >> >>> ) > > >> >>> > Retrying send messsage due to RetriableException - > > >> >>> org.apache.kafka.common. > > >> >>> > errors.NotLeaderForPartitionException: This server is not the > > >> leader > > >> >>> for > > >> >>> > that topic-partition.. Turn on debugging to get a full stack > trace > > >> >>> > >> 2016-08-21 10:49:11 [WARN ](o.a.s.s.k.KafkaSystemProducer > > >> :66 > > >> >>> ) > > >> >>> > Retrying send messsage due to RetriableException - > > >> >>> org.apache.kafka.common. > > >> >>> > errors.NotLeaderForPartitionException: This server is not the > > >> leader > > >> >>> for > > >> >>> > that topic-partition.. Turn on debugging to get a full stack > trace > > >> >>> > >> 2016-08-21 10:49:21 [WARN ](o.a.s.s.k.KafkaSystemProducer > > >> :66 > > >> >>> ) > > >> >>> > Retrying send messsage due to RetriableException - > > >> >>> org.apache.kafka.common. > > >> >>> > errors.NotLeaderForPartitionException: This server is not the > > >> leader > > >> >>> for > > >> >>> > that topic-partition.. Turn on debugging to get a full stack > trace > > >> >>> > >> 2016-08-21 10:49:31 [WARN ](o.a.s.s.k.KafkaSystemProducer > > >> :66 > > >> >>> ) > > >> >>> > Retrying send messsage due to RetriableException - > > >> >>> org.apache.kafka.common. > > >> >>> > errors.NotLeaderForPartitionException: This server is not the > > >> leader > > >> >>> for > > >> >>> > that topic-partition.. Turn on debugging to get a full stack > trace > > >> >>> > >> 2 > > >> >>> > >> > > >> >>> > >> This happens since "rush hour" for new messages produced to > > >> kafka. > > >> >>> May > > >> >>> > be this is a bug of kafka / samza? > > >> >>> > >> > > >> >>> > >> kafka version: 0.10.0.0 > > >> >>> > >> > > >> >>> > >> kafka config and part of paused log are attached. > > >> >>> > >> > > >> >>> > >> > > >> >>> > >> > > >> >>> > > > > >> >>> > > > > >> >>> > > -- > > >> >>> > > 李斯宁 > > >> >>> > > > > >> >>> > > > >> >>> > > > >> >>> > > > >> >>> > -- > > >> >>> > 李斯宁 > > >> >>> > > > >> >>> > > >> >> > > >> >> > > >> >> > > >> >> -- > > >> >> 李斯宁 > > >> >> > > >> > > > >> > > > >> > > > >> > -- > > >> > 李斯宁 > > >> > > > >> > > >> > > >> > > >> -- > > >> Thanks and regards > > >> > > >> Chinmay Soman > > >> > > > > > > > > > > > > -- > > > 李斯宁 > > > > > > > > > > > -- > > 李斯宁 > > > -- 李斯宁