Re: kafka.common.FailedToSendMessageException: Failed to send messages after 3 tries.

2013-06-24 Thread Markus Roder
We had this issue as well but never the less the message was enqueued four times into the cluster. It would be great to get any hint on this issue. regards -- Markus Roder Am 25.06.2013 um 07:18 schrieb Yogesh Sangvikar : > Hi Jun, > > The stack trace we found is as follow, > > log4j:WARN No

Re: produce request failed: due to Leader not local for partition

2013-06-24 Thread Jason Rosenberg
Also, looking back at my logs, I'm wondering if a producer will reuse the same socket to send data to the same broker, for multiple topics (I'm guessing yes). In which case, it looks like I'm seeing this scenario: 1. producer1 is happily sending messages for topicX and topicY to serverA (serverA

Re: produce request failed: due to Leader not local for partition

2013-06-24 Thread Jason Rosenberg
Filed https://issues.apache.org/jira/browse/KAFKA-955 On Mon, Jun 24, 2013 at 10:14 PM, Jason Rosenberg wrote: > Jun, > > To be clear, this whole discussion was started, because I am clearly > seeing "failed due to Leader not local" on the last broker restarted, > after all the controlled shutt

Re: kafka.common.FailedToSendMessageException: Failed to send messages after 3 tries.

2013-06-24 Thread Yogesh Sangvikar
Hi Jun, The stack trace we found is as follow, log4j:WARN No appenders could be found for logger (kafka.utils.VerifiableProperties). log4j:WARN Please initialize the log4j system properly. kafka.common.FailedToSendMessageException: Failed to send messages after 3 tries. at kafka.producer.

Re: produce request failed: due to Leader not local for partition

2013-06-24 Thread Jason Rosenberg
Jun, To be clear, this whole discussion was started, because I am clearly seeing "failed due to Leader not local" on the last broker restarted, after all the controlled shutting down has completed and all brokers restarted. This leads me to believe that a client made a meta data request and found

Re: produce request failed: due to Leader not local for partition

2013-06-24 Thread Jun Rao
That should be fine since the old socket in the producer will no longer be usable after a broker is restarted. Thanks, Jun On Mon, Jun 24, 2013 at 9:50 PM, Jason Rosenberg wrote: > What about a non-controlled shutdown, and a restart, but the producer never > attempts to send anything during t

Re: produce request failed: due to Leader not local for partition

2013-06-24 Thread Jason Rosenberg
What about a non-controlled shutdown, and a restart, but the producer never attempts to send anything during the time the broker was down? That could have caused a leader change, but without the producer knowing to refresh it's metadata, no? On Mon, Jun 24, 2013 at 9:05 PM, Jun Rao wrote: > Ot

Re: Kafka User Group Meeting

2013-06-24 Thread Jun Rao
Just an update on this. We have an updated agenda with confirmed speakers from Netflix and Richrelevance on their use cases of Kafka. We will also be streaming this event for people who are remote (details to be provided in the meetup). If you plan to attend in person, please RSVP in the meetup so

Re: kafka.common.FailedToSendMessageException: Failed to send messages after 3 tries.

2013-06-24 Thread Jun Rao
Could you attach the log before FailedToSendMessageException in the producer? It should tell you the reason why the message can't be sent. Thanks, Jun On Mon, Jun 24, 2013 at 9:20 PM, Yogesh Sangvikar < yogesh.sangvi...@gmail.com> wrote: > Hi Team, > > We are using kafka-0.8.0-beta1-candidate

kafka.common.FailedToSendMessageException: Failed to send messages after 3 tries.

2013-06-24 Thread Yogesh Sangvikar
Hi Team, We are using kafka-0.8.0-beta1-candidate1 release. ( https://github.com/apache/kafka/tree/0.8.0-beta1-candidate1). While running producer with following configuration, we found an issue "kafka.common. FailedToSendMessageException: Failed to send messages after 3 tries", We are using def

Re: produce request failed: due to Leader not local for partition

2013-06-24 Thread Jun Rao
Other than controlled shutdown, the only other case that can cause the leader to change when the underlying broker is alive is when the broker expires its ZK session (likely due to GC), which should be rare. That being said, forwarding in the broker may not be a bad idea. Could you file a jira to t

Re: produce request failed: due to Leader not local for partition

2013-06-24 Thread Jason Rosenberg
Yeah, I see that with ack=0, the producer will be in a bad state anytime the leader for it's partition has changed, while the broker that it thinks is the leader is still up. So this is a problem in general, not only for controlled shutdown, but even for the case where you've restarted a server (

Re: Kafka Responsiveness

2013-06-24 Thread Florin Trofin
This might work OK for 0.7 but you might run into trouble with 0.8 when replication is enabled. Make sure you test all the different scenarios for failure. See the previous discussion thread "Kafka 0.8 Failover Behavior". Let us know how it works for you. Cheers! Florin On 6/24/13 4:21 AM, "Han

Re: FAQ

2013-06-24 Thread Jun Rao
Yes, I think that would be better. Thanks, Jun On Mon, Jun 24, 2013 at 10:30 AM, Jay Kreps wrote: > I have noticed we don't do a good job of updating the FAQs. Would we > do better if I migrated it to the wiki so it was easier to edit? > > -Jay >

FAQ

2013-06-24 Thread Jay Kreps
I have noticed we don't do a good job of updating the FAQs. Would we do better if I migrated it to the wiki so it was easier to edit? -Jay

Re: produce request failed: due to Leader not local for partition

2013-06-24 Thread Joel Koshy
I think Jason was suggesting quiescent time as a possibility only if the broker did request forwarding if it is not the leader. On Monday, June 24, 2013, Jun Rao wrote: > Jason, > > The quiescence time that you proposed won't work. The reason is that with > ack=0, the producer starts losing data

Re: Kafka Responsiveness

2013-06-24 Thread Jun Rao
The simplest check is to see if you can connect to the Kafka broker port. That just means the broker is up, but doesn't necessarily mean it is responsive. Another approach is to do a small write or read on an empty testing topic. Thanks, Jun On Mon, Jun 24, 2013 at 4:21 AM, Hanish Bansal < hani

Re: produce request failed: due to Leader not local for partition

2013-06-24 Thread Jun Rao
Jason, The quiescence time that you proposed won't work. The reason is that with ack=0, the producer starts losing data silently from the moment the leader is moved (by controlled shutdown) until the broker is shut down. So, the sooner that you can shut down the broker, the better. What we realize

Kafka Responsiveness

2013-06-24 Thread Hanish Bansal
Hi I am implementing autostart service for kafka which will check kafka's state If kafka is not running it will autostart kafka process. I also want to check whether Kafka is running or in unresponsive state. If kafka is in unresponsive state how to determine that? Is there any chances that kafk

Re: batch sending in sync mode

2013-06-24 Thread Jason Rosenberg
In async mode, I don't think I have any way of handling send failures or even knowing about them. In sync mode, I can handle exceptions, etc., after max retries have happened. Make sense? Jason On Mon, Jun 24, 2013 at 1:59 AM, Joel Koshy wrote: > If you send a list of messages in sync mode, t

Re: batch sending in sync mode

2013-06-24 Thread Sriram Subramanian
The messages will be grouped by their destination broker and further grouped by topic/partition. The send then happens to each broker with a list of topic/partitions and messages for them and waits for an acknowledgement from each broker. This happens sequentially. So, the messages are acknowledged

Re: batch sending in sync mode

2013-06-24 Thread Joel Koshy
If you send a list of messages in sync mode, those messages will be partitioned (randomly by default) and the collated messages will be sent out in batches to each broker. i.e., the original batch may get split into smaller batches - each of those batches is acknowledged. Why not use async mode wit

Re: produce request failed: due to Leader not local for partition

2013-06-24 Thread Joel Koshy
After we implement non-blocking IO for the producer, there may not be much incentive left to use ack = 0, but this is an interesting idea - not just for the controlled shutdown case, but also when leadership moves due to say, a broker's zk session expiring. Will have to think about it a bit more.

Re: Replication across Multiple Datacenters

2013-06-24 Thread Joel Koshy
I don't think replication is ideal for creating single clusters spanning DCs for at least a couple reasons: the replica assignment strategy is currently not rack or DC-aware although that can be addressed by manually creating topics. Also, network glitches and latencies which are more likely in a c

batch sending in sync mode

2013-06-24 Thread Jason Rosenberg
I have been using async mode with 0.7.2, but I'm wondering if I should switch to sync mode, so I can use the new request.required.acks mode in a sensible way. I am already managing an async queue that then dispatches to the samsa producer. I'm wondering how the acknowledgement mode works when sen

Re: produce request failed: due to Leader not local for partition

2013-06-24 Thread Jason Rosenberg
Yeah I am using ack = 0, so that makes sense. I'll need to rethink that, it would seem. It would be nice, wouldn't it, in this case, for the broker to realize this and just forward the messages to the correct leader. Would that be possible? Also, it would be nice to have a second option to the