Other than controlled shutdown, the only other case that can cause the leader to change when the underlying broker is alive is when the broker expires its ZK session (likely due to GC), which should be rare. That being said, forwarding in the broker may not be a bad idea. Could you file a jira to track this?
Thanks, Jun On Mon, Jun 24, 2013 at 2:50 PM, Jason Rosenberg <j...@squareup.com> wrote: > Yeah, > > I see that with ack=0, the producer will be in a bad state anytime the > leader for it's partition has changed, while the broker that it thinks is > the leader is still up. So this is a problem in general, not only for > controlled shutdown, but even for the case where you've restarted a server > (without controlled shutdown), which in and of itself can force a leader > change. If the producer doesn't attempt to send a message during the time > the broker was down, it will never get a connection failure, and never get > fresh metadata, and subsequently start sending messages to the non-leader. > > Thus, I'd say this is a problem with ack=0, regardless of controlled > shutdown. Any time there's a leader change, the producer will send > messages into the ether. I think this is actually a severe condition, that > could be considered a bug. How hard would it be to have the receiving > broker forward on to the leader, in this case? > > Jason > > > On Mon, Jun 24, 2013 at 8:44 AM, Joel Koshy <jjkosh...@gmail.com> wrote: > > > I think Jason was suggesting quiescent time as a possibility only if the > > broker did request forwarding if it is not the leader. > > > > On Monday, June 24, 2013, Jun Rao wrote: > > > > > Jason, > > > > > > The quiescence time that you proposed won't work. The reason is that > with > > > ack=0, the producer starts losing data silently from the moment the > > leader > > > is moved (by controlled shutdown) until the broker is shut down. So, > the > > > sooner that you can shut down the broker, the better. What we realized > is > > > that if you can use a larger batch size, ack=1 can still deliver very > > good > > > throughput. > > > > > > Thanks, > > > > > > Jun > > > > > > > > > On Mon, Jun 24, 2013 at 12:22 AM, Jason Rosenberg <j...@squareup.com > > <javascript:;>> > > > wrote: > > > > > > > Yeah I am using ack = 0, so that makes sense. I'll need to rethink > > that, > > > > it would seem. It would be nice, wouldn't it, in this case, for the > > > broker > > > > to realize this and just forward the messages to the correct leader. > > > Would > > > > that be possible? > > > > > > > > Also, it would be nice to have a second option to the controlled > > shutdown > > > > (e.g. controlled.shutdown.quiescence.ms), to allow the broker to > wait > > > > after > > > > the controlled shutdown, a prescribed amount of time before actually > > > > shutting down the server. Then, I could set this value to something a > > > > little greater than the producer's ' > topic.metadata.refresh.interval.ms > > '. > > > > This would help with hitless rolling restarts too. Currently, every > > > > producer gets a very loud "Connection Reset" with a tall stack trace > > each > > > > time I restart a broker. Would be nicer to have the producers still > be > > > > able to produce until the metadata refresh interval expires, then get > > the > > > > word that the leader has moved due to the controlled shutdown, and > then > > > > start producing to the new leader, all before the shutting down > server > > > > actually shuts down. Does that seem feasible? > > > > > > > > Jason > > > > > > > > > > > > On Sun, Jun 23, 2013 at 8:23 PM, Jun Rao <jun...@gmail.com > > <javascript:;>> > > > wrote: > > > > > > > > > Jason, > > > > > > > > > > Are you using ack = 0 in the producer? This mode doesn't work well > > with > > > > > controlled shutdown (this is explained in FAQ i*n > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#)* > > > > > * > > > > > * > > > > > Thanks, > > > > > > > > > > Jun > > > > > > > > > > > > > > > On Sun, Jun 23, 2013 at 1:45 AM, Jason Rosenberg <j...@squareup.com > > <javascript:;> > > > > > > > > wrote: > > > > > > > > > > > I'm working on trying on having seamless rolling restarts for my > > > kafka > > > > > > servers, running 0.8. I have it so that each server will be > > > restarted > > > > > > sequentially. Each server takes itself out of the load balancer > > > (e.g. > > > > > sets > > > > > > a status that the lb will recognize, and then waits more than > long > > > > enough > > > > > > for the lb to stop sending meta-data requests to that server). > > Then > > > I > > > > > > initiate the shutdown (with controlled.shutdown.enable=true). > This > > > > seems > > > > > > to work well, however, I occasionally see warnings like this in > the > > > log > > > > > > from the server, after restart: > > > > > > > > > > > > 2013-06-23 08:28:46,770 WARN [kafka-request-handler-2] > > > > server.KafkaApis > > > > > - > > > > > > [KafkaApi-508818741] Produce request with correlation id 7136261 > > from > > > > > > client on partition [mytopic,0] failed due to Leader not local > for > > > > > > partition [mytopic,0] on broker 508818741 > > > > > > > > > > > > This WARN seems to persistently repeat, until the producer client > > > > > initiates > > > > > > a new meta-data request (e.g. every 10 minutes, by default). > > > However, > > > > > the > > > > > > producer doesn't log any errors/exceptions when the server is > > logging > > > > > this > > > > > > WARN. > > > > > > > > > > > > What's happening here? Is the message silently being forwarded > on > > to > > > > the > > > > > > correct leader for the partition? Is the message dropped? Are > > these > > > > > WARNS > > > > > > particularly useful? > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Jason > > > > > > > > > > > > > > > > > > > > >