subject:"Re\: produce request failed\: due to Leader not local for partition"

Re: produce request failed: due to Leader not local for partition

2013-06-30 Thread Jun Rao

Commented on the jira. Thanks, Jun On Sat, Jun 29, 2013 at 6:21 AM, Jason Rosenberg wrote: > I added this scenario to KAFKA-955. > > I'm thinking that this scenario could be a problem for ack=0 in general > (even without controlled shutdown). If we do an "uncontrolled" shutdown, > it seems t

Re: produce request failed: due to Leader not local for partition

2013-06-29 Thread Jason Rosenberg

I added this scenario to KAFKA-955. I'm thinking that this scenario could be a problem for ack=0 in general (even without controlled shutdown). If we do an "uncontrolled" shutdown, it seems that some topics won't ever know there could have been a leader change. Would it make sense to force a met

Re: produce request failed: due to Leader not local for partition

2013-06-24 Thread Jason Rosenberg

Also, looking back at my logs, I'm wondering if a producer will reuse the same socket to send data to the same broker, for multiple topics (I'm guessing yes). In which case, it looks like I'm seeing this scenario: 1. producer1 is happily sending messages for topicX and topicY to serverA (serverA

Re: produce request failed: due to Leader not local for partition

2013-06-24 Thread Jason Rosenberg

Filed https://issues.apache.org/jira/browse/KAFKA-955 On Mon, Jun 24, 2013 at 10:14 PM, Jason Rosenberg wrote: > Jun, > > To be clear, this whole discussion was started, because I am clearly > seeing "failed due to Leader not local" on the last broker restarted, > after all the controlled shutt

Re: produce request failed: due to Leader not local for partition

2013-06-24 Thread Jason Rosenberg

Jun, To be clear, this whole discussion was started, because I am clearly seeing "failed due to Leader not local" on the last broker restarted, after all the controlled shutting down has completed and all brokers restarted. This leads me to believe that a client made a meta data request and found

Re: produce request failed: due to Leader not local for partition

2013-06-24 Thread Jun Rao

That should be fine since the old socket in the producer will no longer be usable after a broker is restarted. Thanks, Jun On Mon, Jun 24, 2013 at 9:50 PM, Jason Rosenberg wrote: > What about a non-controlled shutdown, and a restart, but the producer never > attempts to send anything during t

Re: produce request failed: due to Leader not local for partition

2013-06-24 Thread Jason Rosenberg

What about a non-controlled shutdown, and a restart, but the producer never attempts to send anything during the time the broker was down? That could have caused a leader change, but without the producer knowing to refresh it's metadata, no? On Mon, Jun 24, 2013 at 9:05 PM, Jun Rao wrote: > Ot

Re: produce request failed: due to Leader not local for partition

2013-06-24 Thread Jun Rao

Other than controlled shutdown, the only other case that can cause the leader to change when the underlying broker is alive is when the broker expires its ZK session (likely due to GC), which should be rare. That being said, forwarding in the broker may not be a bad idea. Could you file a jira to t

Re: produce request failed: due to Leader not local for partition

2013-06-24 Thread Jason Rosenberg

Yeah, I see that with ack=0, the producer will be in a bad state anytime the leader for it's partition has changed, while the broker that it thinks is the leader is still up. So this is a problem in general, not only for controlled shutdown, but even for the case where you've restarted a server (

Re: produce request failed: due to Leader not local for partition

2013-06-24 Thread Joel Koshy

I think Jason was suggesting quiescent time as a possibility only if the broker did request forwarding if it is not the leader. On Monday, June 24, 2013, Jun Rao wrote: > Jason, > > The quiescence time that you proposed won't work. The reason is that with > ack=0, the producer starts losing data

Re: produce request failed: due to Leader not local for partition

2013-06-24 Thread Jun Rao

Jason, The quiescence time that you proposed won't work. The reason is that with ack=0, the producer starts losing data silently from the moment the leader is moved (by controlled shutdown) until the broker is shut down. So, the sooner that you can shut down the broker, the better. What we realize

Re: produce request failed: due to Leader not local for partition

2013-06-24 Thread Joel Koshy

After we implement non-blocking IO for the producer, there may not be much incentive left to use ack = 0, but this is an interesting idea - not just for the controlled shutdown case, but also when leadership moves due to say, a broker's zk session expiring. Will have to think about it a bit more.

Re: produce request failed: due to Leader not local for partition

2013-06-24 Thread Jason Rosenberg

Yeah I am using ack = 0, so that makes sense. I'll need to rethink that, it would seem. It would be nice, wouldn't it, in this case, for the broker to realize this and just forward the messages to the correct leader. Would that be possible? Also, it would be nice to have a second option to the

Re: produce request failed: due to Leader not local for partition

2013-06-23 Thread Jun Rao

Jason, Are you using ack = 0 in the producer? This mode doesn't work well with controlled shutdown (this is explained in FAQ i*n https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#)* * * Thanks, Jun On Sun, Jun 23, 2013 at 1:45 AM, Jason Rosenberg wrote: > I'm working on tryi

Re: produce request failed: due to Leader not local for partition

2013-06-23 Thread Jason Rosenberg

Hi Sriram, I don't see any indication at all on the producer that there's a problem. Only the above logging on the server (and it repeats continually). I think what may be happening is that the producer for that topic did not actually try to send a message between the start of the controlled shu

Re: produce request failed: due to Leader not local for partition

2013-06-23 Thread Sriram Subramanian

Hey Jason, The producer on failure initiates a metadata request to refresh its state and should issue subsequent requests to the new leader. The errors that you see should only happen once per topic partition per producer. Let me know if this is not what you see. On the producer end you should see

Re: produce request failed: due to Leader not local for partition

Re: produce request failed: due to Leader not local for partition

Re: produce request failed: due to Leader not local for partition

Re: produce request failed: due to Leader not local for partition

Re: produce request failed: due to Leader not local for partition

Re: produce request failed: due to Leader not local for partition

Re: produce request failed: due to Leader not local for partition

Re: produce request failed: due to Leader not local for partition

Re: produce request failed: due to Leader not local for partition

Re: produce request failed: due to Leader not local for partition

Re: produce request failed: due to Leader not local for partition

Re: produce request failed: due to Leader not local for partition

Re: produce request failed: due to Leader not local for partition

Re: produce request failed: due to Leader not local for partition

Re: produce request failed: due to Leader not local for partition

Re: produce request failed: due to Leader not local for partition

16 matches

Site Navigation

Mail list logo

Footer information