Hi, Kishore, First, I would like to ask which version of Samza you are running? And if you can attach the log and config of your container (i.e. I assume the log you attached here is a container log?), it would be greatly helpful.
Thanks a lot! -Yi On Mon, Dec 14, 2015 at 5:07 AM, Kishore N C <kishor...@gmail.com> wrote: > Hi, > > I have a 25 node Samza cluster and I am running a job on a dataset of a > billion records that is backed by a 7 node Kafka cluster. > > Some of the tasks on some of the Samza nodes don't seem to start at all > (while other tasks run fine on other nodes). The specific error message I > see is in the task log is: > > 2015-12-14 12:50:50 ClientUtils$ [INFO] Fetching metadata from broker > id:5,host:10.181.18.87,port:9082 with correlation id 0 for 2 topic(s) > Set(TOPIC_ONE, TOPIC_TWO) > 2015-12-14 12:50:50 SyncProducer [INFO] Connected to 10.181.18.87:9082 for > producing > 2015-12-14 12:50:50 SyncProducer [INFO] Disconnecting from > 10.181.18.87:9082 > 2015-12-14 12:51:22 SimpleConsumer [INFO] Reconnect due to socket error: > java.nio.channels.ClosedChannelException > > Sometimes, there is a variation like this: > > 2015-12-14 13:05:47 ClientUtils$ [WARN] Fetching topic metadata with > correlation id 0 for topics [Set(TOPIC_ONE, TOPIC_TWO)] from > broker [id:6,host:10.181.18.193,port:9082] failed > java.nio.channels.ClosedChannelException > at kafka.network.BlockingChannel.send(BlockingChannel.scala:100) > at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:73) > at > > kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:72) > at kafka.producer.SyncProducer.send(SyncProducer.scala:113) > at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:58) > at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:93) > at > > org.apache.samza.util.ClientUtilTopicMetadataStore.getTopicInfo(ClientUtilTopicMetadataStore.scala:37) > at > > org.apache.samza.system.kafka.KafkaSystemAdmin.getTopicMetadata(KafkaSystemAdmin.scala:214) > at > > org.apache.samza.system.kafka.KafkaSystemAdmin$$anonfun$getSystemStreamMetadata$2$$anonfun$6.apply(KafkaSystemAdmin.scala:158) > at > > org.apache.samza.system.kafka.KafkaSystemAdmin$$anonfun$getSystemStreamMetadata$2$$anonfun$6.apply(KafkaSystemAdmin.scala:158) > at > > org.apache.samza.system.kafka.TopicMetadataCache$.getTopicMetadata(TopicMetadataCache.scala:52) > at > > org.apache.samza.system.kafka.KafkaSystemAdmin$$anonfun$getSystemStreamMetadata$2.apply(KafkaSystemAdmin.scala:155) > at > > org.apache.samza.system.kafka.KafkaSystemAdmin$$anonfun$getSystemStreamMetadata$2.apply(KafkaSystemAdmin.scala:154) > at > > org.apache.samza.util.ExponentialSleepStrategy.run(ExponentialSleepStrategy.scala:82) > > The above logs just keeps looping and the task never starts processing > input. I was able to telnet into the host/port combination from the same > machine. Any idea/pointers to what could be going wrong is greatly > appreciated. > > Thanks, > > KN. >