Adrian, Thanks for your response. I just looked at both machines we're testing on and on both the Kafka server process looks OK. Anything specific I can check otherwise?
>From googling around, I see some posts where folks suggest to check the DNS settings (those appear fine) and to set the advertised.host.name in Kafka's server.properties. Yay/nay? Thanks again. On Tue, Sep 29, 2015 at 8:31 AM, Adrian Tanase <atan...@adobe.com> wrote: > I believe some of the brokers in your cluster died and there are a number > of partitions that nobody is currently managing. > > -adrian > > From: Dmitry Goldenberg > Date: Tuesday, September 29, 2015 at 3:26 PM > To: "user@spark.apache.org" > Subject: Kafka error "partitions don't have a leader" / > LeaderNotAvailableException > > I apologize for posting this Kafka related issue into the Spark list. Have > gotten no responses on the Kafka list and was hoping someone on this list > could shed some light on the below. > > ------------------------------------------------------------ > --------------------------- > > We're running into this issue in a clustered environment where we're > trying to send messages to Kafka and are getting the below error. > > Can someone explain what might be causing it and what the error message > means (Failed to send data since partitions [<topic-name>,8] don't have a > leader) ? > > > --------------------------------------------------------------------------------------- > > WARN kafka.producer.BrokerPartitionInfo: Error while fetching > metadata partition 10 leader: none replicas: isr: isUnderReplicated: false > for topic partition [<topic-name>,10]: [class > kafka.common.LeaderNotAvailableException] > > ERROR kafka.producer.async.DefaultEventHandler: Failed to send requests > for topics <topic-name> with correlation ids in [2398792,2398801] > > ERROR com.acme.core.messaging.kafka.KafkaMessageProducer: Error while > sending a message to the message > store. kafka.common.FailedToSendMessageException: Failed to send messages > after 3 tries. > at > kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:90) > ~[kafka_2.10-0.8.2.0.jar:?] > at kafka.producer.Producer.send(Producer.scala:77) > ~[kafka_2.10-0.8.2.0.jar:?] > at kafka.javaapi.producer.Producer.send(Producer.scala:33) > ~[kafka_2.10-0.8.2.0.jar:?] > > WARN kafka.producer.async.DefaultEventHandler: Failed to send data since > partitions [<topic-name>,8] don't have a leader > > What do these errors and warnings mean and how do we get around them? > > > --------------------------------------------------------------------------------------- > > The code for sending messages is basically as follows: > > public class KafkaMessageProducer { > private Producer<String, String> producer; > > ..................... > > public void sendMessage(String topic, String key, > String message) throws IOException, MessagingException { > KeyedMessage<String, String> data = new KeyedMessage<String, > String>(topic, key, message); > try { > producer.send(data); > } catch (Exception ex) { > throw new MessagingException("Error while sending a message to the > message store.", ex); > } > } > > Is it possible that the producer gets "stale" and needs to be > re-initialized? Do we want to re-create the producer on every message (??) > or is it OK to hold on to one indefinitely? > > > --------------------------------------------------------------------------------------- > > The following are the producer properties that are being set into the > producer > > batch.num.messages => 200 > client.id => Acme > compression.codec => none > key.serializer.class => kafka.serializer.StringEncoder > message.send.max.retries => 3 > metadata.broker.list => data2.acme.com:9092,data3.acme.com:9092 > partitioner.class => kafka.producer.DefaultPartitioner > producer.type => sync > queue.buffering.max.messages => 10000 > queue.buffering.max.ms => 5000 > queue.enqueue.timeout.ms => -1 > request.required.acks => 1 > request.timeout.ms => 10000 > retry.backoff.ms => 1000 > send.buffer.bytes => 102400 > serializer.class => kafka.serializer.StringEncoder > topic.metadata.refresh.interval.ms => 600000 > > > Thanks. >