Not sure, but should combination of auto.leader.rebalance.enable=true and controlled.shutdown.enable=true sort this out for you?
2016-01-13 1:13 GMT+01:00 Scott Reynolds <sreyno...@twilio.com>: > we use 0.9.0.0 and it is working fine. Not all the features work and a few > things make a few assumptions about how zookeeper is used. But as a tool > for provisioning, expanding and failure recovery it is working fine so far. > > *knocks on wood* > > On Tue, Jan 12, 2016 at 4:08 PM, Luke Steensen < > luke.steen...@braintreepayments.com> wrote: > > > Ah, that's a good idea. Do you know if kafka-manager works with kafka 0.9 > > by chance? That would be a nice improvement of the cli tools. > > > > Thanks, > > Luke > > > > > > On Tue, Jan 12, 2016 at 4:53 PM, Scott Reynolds <sreyno...@twilio.com> > > wrote: > > > > > Luke, > > > > > > We practice the same immutable pattern on AWS. To decommission a > broker, > > we > > > use partition reassignment first to move the partitions off of the node > > and > > > preferred leadership election. To do this with a web ui, so that you > can > > > handle it on lizard brain at 3 am, we have the Yahoo Kafka Manager > > running > > > on the broker hosts. > > > > > > https://github.com/yahoo/kafka-manager > > > > > > On Tue, Jan 12, 2016 at 2:50 PM, Luke Steensen < > > > luke.steen...@braintreepayments.com> wrote: > > > > > > > Hello, > > > > > > > > We've run into a bit of a head-scratcher with a new kafka deployment > > and > > > > I'm curious if anyone has any ideas. > > > > > > > > A little bit of background: this deployment uses "immutable > > > infrastructure" > > > > on AWS, so instead of configuring the host in-place, we stop the > > broker, > > > > tear down the instance, and replace it wholesale. My understanding > was > > > that > > > > controlled shutdown combined with producer retries would allow this > > > > operation to be zero-downtime. Unfortunately, things aren't working > > quite > > > > as I expected. > > > > > > > > After poring over the logs, I pieced together to following chain of > > > events: > > > > > > > > 1. our operations script stops the broker process and proceeds to > > > > terminate the instance > > > > 2. our producer application detects the disconnect and requests > > > updated > > > > metadata from another node > > > > 3. updated metadata is returned successfully, but the downed > broker > > is > > > > still listed as leader for a single partition of the given topic > > > > 4. on the next produce request bound for that partition, the > > producer > > > > attempts to initiate a connection to the downed host > > > > 5. because the instance has been terminated, the node is now in > the > > > > "connecting" state until the system-level tcp timeout expires (2-3 > > > > minutes) > > > > 6. during this time, all produce requests to the given partition > sit > > > in > > > > the record accumulator until they expire and are immediately > failed > > > > without > > > > retries > > > > 7. the tcp timeout finally fires, the node is recognized as > > > > disconnected, more metadata is fetched, and things return to > sanity > > > > > > > > I was able to work around the issue by waiting 60 seconds between > > > shutting > > > > down the broker and terminating that instance, as well as raising > > > > request.timeout.ms on the producer to 2x our zookeeper timeout. This > > > gives > > > > the broker a much quicker "connection refused" error instead of the > > > > connection timeout and seems to give enough time for normal failure > > > > detection and leader election to kick in before requests are timed > out. > > > > > > > > So two questions really: (1) are there any known issues that would > > cause > > > a > > > > controlled shutdown to fail to release leadership of all partitions? > > and > > > > (2) should the producer be timing out connection attempts more > > > proactively? > > > > > > > > Thanks, > > > > Luke > > > > > > > > > >