Produce request failed due to NotLeaderForPartitionException (leader epoch is old)

2015-10-27 Thread Chaitanya GSK
Hi Ivan, I face the same issue you had. Were you able to figure out why it’s happening? If so, could you please share it. Thanks, Chaitanya GSK

Produce request failed due to NotLeaderForPartitionException (leader epoch is old)

2015-06-16 Thread Ivan Balashov
Hi, During a round of kafka data discrepancy investigation I came across a bunch of recurring errors below: producer.log > 2015-06-14 13:06:25,591 WARN [task-thread-9] > (k.p.a.DefaultEventHandler:83) - Produce request with correlation id 624 > failed due to [mytopic,21]: kafka.common.NotLeader

Re: Produce request failed

2014-09-04 Thread Ryan Williams
Thanks for looking, and confirming. The latest on bpot/poseidon resolves this problem (but has others unfortunately), so pretty sure it's on the client side. I'm working to patch it and get back on track for producing from ruby apps. On Thu, Sep 4, 2014 at 8:20 AM, Jun Rao wrote: > It seems t

Re: Produce request failed

2014-09-04 Thread Jun Rao
It seems that the producer is trying to send data for Partition [topic1,1] to broker 3. However, that partition is hosted on broker 1. Could you try the java client on the same broker? If that works, the issue is likely in the Ruby producer, possibly due to incorrect metadata handling. Thanks, Ju

Produce request failed

2014-09-02 Thread Ryan Williams
I have a 3 node kafka cluster running 0.8.1.1, recently updated from 0.8.1 and noticing now that producing from Ruby/Poseidon is having trouble. If I'm reading correctly, it appears that the Poseidon is attempting to produce on partition 1 on kafka1, but partition 1 is on kafka1. Does this look l

Re: produce request failed: due to Leader not local for partition

2013-06-30 Thread Jun Rao
Commented on the jira. Thanks, Jun On Sat, Jun 29, 2013 at 6:21 AM, Jason Rosenberg wrote: > I added this scenario to KAFKA-955. > > I'm thinking that this scenario could be a problem for ack=0 in general > (even without controlled shutdown). If we do an "uncontrolled" shutdown, > it seems t

Re: produce request failed: due to Leader not local for partition

2013-06-29 Thread Jason Rosenberg
I added this scenario to KAFKA-955. I'm thinking that this scenario could be a problem for ack=0 in general (even without controlled shutdown). If we do an "uncontrolled" shutdown, it seems that some topics won't ever know there could have been a leader change. Would it make sense to force a met

Re: produce request failed: due to Leader not local for partition

2013-06-24 Thread Jason Rosenberg
Also, looking back at my logs, I'm wondering if a producer will reuse the same socket to send data to the same broker, for multiple topics (I'm guessing yes). In which case, it looks like I'm seeing this scenario: 1. producer1 is happily sending messages for topicX and topicY to serverA (serverA

Re: produce request failed: due to Leader not local for partition

2013-06-24 Thread Jason Rosenberg
Filed https://issues.apache.org/jira/browse/KAFKA-955 On Mon, Jun 24, 2013 at 10:14 PM, Jason Rosenberg wrote: > Jun, > > To be clear, this whole discussion was started, because I am clearly > seeing "failed due to Leader not local" on the last broker restarted, > after all the controlled shutt

Re: produce request failed: due to Leader not local for partition

2013-06-24 Thread Jason Rosenberg
Jun, To be clear, this whole discussion was started, because I am clearly seeing "failed due to Leader not local" on the last broker restarted, after all the controlled shutting down has completed and all brokers restarted. This leads me to believe that a client made a meta data request and found

Re: produce request failed: due to Leader not local for partition

2013-06-24 Thread Jun Rao
That should be fine since the old socket in the producer will no longer be usable after a broker is restarted. Thanks, Jun On Mon, Jun 24, 2013 at 9:50 PM, Jason Rosenberg wrote: > What about a non-controlled shutdown, and a restart, but the producer never > attempts to send anything during t

Re: produce request failed: due to Leader not local for partition

2013-06-24 Thread Jason Rosenberg
What about a non-controlled shutdown, and a restart, but the producer never attempts to send anything during the time the broker was down? That could have caused a leader change, but without the producer knowing to refresh it's metadata, no? On Mon, Jun 24, 2013 at 9:05 PM, Jun Rao wrote: > Ot

Re: produce request failed: due to Leader not local for partition

2013-06-24 Thread Jun Rao
Other than controlled shutdown, the only other case that can cause the leader to change when the underlying broker is alive is when the broker expires its ZK session (likely due to GC), which should be rare. That being said, forwarding in the broker may not be a bad idea. Could you file a jira to t

Re: produce request failed: due to Leader not local for partition

2013-06-24 Thread Jason Rosenberg
Yeah, I see that with ack=0, the producer will be in a bad state anytime the leader for it's partition has changed, while the broker that it thinks is the leader is still up. So this is a problem in general, not only for controlled shutdown, but even for the case where you've restarted a server (

Re: produce request failed: due to Leader not local for partition

2013-06-24 Thread Joel Koshy
I think Jason was suggesting quiescent time as a possibility only if the broker did request forwarding if it is not the leader. On Monday, June 24, 2013, Jun Rao wrote: > Jason, > > The quiescence time that you proposed won't work. The reason is that with > ack=0, the producer starts losing data

Re: produce request failed: due to Leader not local for partition

2013-06-24 Thread Jun Rao
Jason, The quiescence time that you proposed won't work. The reason is that with ack=0, the producer starts losing data silently from the moment the leader is moved (by controlled shutdown) until the broker is shut down. So, the sooner that you can shut down the broker, the better. What we realize

Re: produce request failed: due to Leader not local for partition

2013-06-24 Thread Joel Koshy
After we implement non-blocking IO for the producer, there may not be much incentive left to use ack = 0, but this is an interesting idea - not just for the controlled shutdown case, but also when leadership moves due to say, a broker's zk session expiring. Will have to think about it a bit more.

Re: produce request failed: due to Leader not local for partition

2013-06-24 Thread Jason Rosenberg
Yeah I am using ack = 0, so that makes sense. I'll need to rethink that, it would seem. It would be nice, wouldn't it, in this case, for the broker to realize this and just forward the messages to the correct leader. Would that be possible? Also, it would be nice to have a second option to the

Re: produce request failed: due to Leader not local for partition

2013-06-23 Thread Jun Rao
Jason, Are you using ack = 0 in the producer? This mode doesn't work well with controlled shutdown (this is explained in FAQ i*n https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#)* * * Thanks, Jun On Sun, Jun 23, 2013 at 1:45 AM, Jason Rosenberg wrote: > I'm working on tryi

Re: produce request failed: due to Leader not local for partition

2013-06-23 Thread Jason Rosenberg
Hi Sriram, I don't see any indication at all on the producer that there's a problem. Only the above logging on the server (and it repeats continually). I think what may be happening is that the producer for that topic did not actually try to send a message between the start of the controlled shu

Re: produce request failed: due to Leader not local for partition

2013-06-23 Thread Sriram Subramanian
Hey Jason, The producer on failure initiates a metadata request to refresh its state and should issue subsequent requests to the new leader. The errors that you see should only happen once per topic partition per producer. Let me know if this is not what you see. On the producer end you should see

produce request failed: due to Leader not local for partition

2013-06-23 Thread Jason Rosenberg
I'm working on trying on having seamless rolling restarts for my kafka servers, running 0.8. I have it so that each server will be restarted sequentially. Each server takes itself out of the load balancer (e.g. sets a status that the lb will recognize, and then waits more than long enough for the