Hi, Thanks for your reply We have 4 phases of deploys and in each phase, we can take down few machines These releases happen every 2 weeks, because on all machines, there are a bunch of other micro services running along with the core system - Kafka in this case
My only concern is that during runtime, I.e. between 2 releases, the replica distribution per topic can become disoriented because of some restarts or occasional machines failures/reboots Because of that, the steps that you've mentioned would become operational nightmare for us. What I'm looking for is a more automated solution e.g. even if all the replicas for a partition are down, the producers (running from around 50 machines) should switch to the other available partitions, until this partition becomes available Also, on the consumer side, consumers should not fail but keep consuming from the available partitions until this partition comes up Is it possible with the new producer and new consumer or high level consumer? Thanks, Prabhjot On Nov 27, 2015 12:00 AM, "Ben Stopford" <b...@confluent.io> wrote: > Hi Prabhjot > > I may have slightly misunderstood your question so apologies if that’s the > case. The general approach to releases is to use a rolling upgrade where > you take one machine offline at a time, restart it, wait for it to come > online (you can monitor this via JMX) then move onto the next. If you’re > taking multiple machines offline at the same time you need to be careful > about where the replicas for those machines reside. You can examine these > individually for each topic via kafka-topcis.sh. > > Regarding your questions the following points may be of use: > > - Only one replica (the leader) will be available for writing at any one > time in Kafka. If you offline machines then Kafka will switch over to use > replicas on other machines if they are available. > - The behaviour of produce requests will depend on the acknowledgment > setting the producer provides, the setting for minimum in sync replicas and > how many replicas remain standing after the failure. There are a few things > going on here but they’re explained quite well here < > http://kafka.apache.org/090/documentation.html#design_ha>. > - Consumers consume from the leader also so if the leader for a partition > is online then you will be able to consumer from it. If the leader is on a > machine that goes offline then consumption will pause whilst leadership > switches over to a replica. > > All the best > B > > > On 26 Nov 2015, at 17:58, Prabhjot Bharaj <prabhbha...@gmail.com> wrote: > > > > Hi, > > > > Request your expertise on these doubts of mine > > > > Thanks, > > Prabhjot > > > > On Thu, Nov 26, 2015 at 4:43 PM, Prabhjot Bharaj <prabhbha...@gmail.com> > > wrote: > > > >> Hi, > >> > >> We arrange our kafka machines in groups and deploy these phases. > >> > >> For kafka, we’ll have to map groups with phases. During each phase of > the > >> release, all the machines in that group can go down. > >> > >> When this happens, there are a couple of cases:- > >> > >> 1. All replicas are residing in a group of machines which will all go > >> down in this phase > >> - Affect on Producer – > >> - What happens to the produce requests (whether produce can > >> dynamically keep writing to the remaining partitions now) > >> - What happens to the already queued requests which were being > >> sent to the earlier replicas – they will fail (we’ll have to > use producer > >> callback feature to take care of retrying in case the above step > >> works fine) > >> - Affect on Consumer - > >> - Can the consumers consume from a lesser number of partitions? > >> - Does the consumer 'consume' api gives any callback/failure > >> when all replicas of a partition go down? > >> > >> If you have come across any of the above cases, please provide how you > >> solved the problem ? or whether everything works just well with Kafka > >> during deployments and my cases described above are all invalid or > handled > >> by kafka and its clients internally ? > >> > >> Thanks, > >> Prabhjot > >> > > > > > > > > -- > > --------------------------------------------------------- > > "There are only 10 types of people in the world: Those who understand > > binary, and those who don't" > >