Just for more info: If I have 10 servers in a cluster, so for the most tolerant cluster, do we need replication-factor = 10? That is also the issue for rebalancing the scaling of kafka cluster, that when we need to add server in a cluster then we also need to increase partitions in topics as well ?
Best, Hafsa 2016-06-01 13:55 GMT+02:00 Ben Stopford <b...@confluent.io>: > Pretty much. It’s not actually related to zookeeper. > > Generalising a bit, replication factor 2 means Kafka can lose 1 machine > and be ok. > > B > > On 1 Jun 2016, at 12:46, Hafsa Asif <hafsa.a...@matchinguu.com> wrote: > > > > So, it means that I should create topics with at least > replication-factor=2 > > inspite of how many servers in a kafka cluster. If any server goes down > or > > slows down then zookeeper will not go out-of-sync. > > Currently, my all topics are with eplication-factor= 1 and I got an issue > > that Zookeeper goes out of sync. So, increasing replication-factor will > > solve the issue? > > > > Hafsa > > > > 2016-06-01 12:57 GMT+02:00 Ben Stopford <b...@confluent.io>: > > > >> Hi Hafa > >> > >> If you create a topic with replication-factor = 2, you can lose one of > >> them without losing data, so long as they were "in sync". Replicas can > fall > >> out of sync if one of the machines runs slow. The system tracks in sync > >> replicas. These are exposed by JMX too. Check out the docs on > replication > >> for more details: > >> > >> http://kafka.apache.org/090/documentation.html#replication < > >> http://kafka.apache.org/090/documentation.html#replication> > >> > >> B > >> > >>> On 1 Jun 2016, at 10:45, Hafsa Asif <hafsa.a...@matchinguu.com> wrote: > >>> > >>> Hello Jayesh, > >>> > >>> Thank you very much for such a good description. My further questions > are > >>> (just to be my self clear about the concept). > >>> > >>> 1. If I have only one partition in a 'Topic' in a Kafka with following > >>> configuration, > >>> bin/kafka-topics.sh --create --zookeeper localhost:2181 > >>> --replication-factor 1 --partitions 1 --topic mytopic1 > >>> Then still I need to rebalance topic partitions while node > >> adding/removing > >>> in Kafka cluster? > >>> > >>> 2. What is the actual meaning of this line 'if all your topics have > >> atleast > >>> 2 insync replicas'. My understanding is that, I need to create replica > of > >>> each topic in each server. e.g: I have two servers in a Kafka cluster > >> then > >>> I need to create topic 'mytopic1' in both servers. It helps to get rid > of > >>> any problem while removing any of the server. > >>> > >>> I will look in detail into your provided link. Many thanks for this. > >>> > >>> Looking forward for the answers from also other Kafka ninjas as well :) > >>> > >>> Best, > >>> Hafsa > >>> > >>> 2016-05-31 18:50 GMT+02:00 Thakrar, Jayesh < > jthak...@conversantmedia.com > >>> : > >>> > >>>> Hafsa, Florin > >>>> > >>>> First thing first, it is possible to scale a Kafka cluster up or down > >>>> (i.e. add/remove servers). > >>>> And as has been noted in this thread, after you add a server to a > >> cluster, > >>>> you need to rebalance the topic partitions in order to put the newly > >> added > >>>> server into use. > >>>> And similarly, before you remove a server, it is advised that you > drain > >>>> off the data from the server to be removed (its not a hard > requirement, > >> if > >>>> all your topics have atleast 2 insync replicas, including the server > >> being > >>>> removed and you intend to rebalance after server removal). > >>>> > >>>> However, "automating" the rebalancing of topic partitions is not > >> trivial. > >>>> > >>>> There is a KIP out there to help with the rebalancing , but lacks > >> details > >>>> - > >>>> > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-6+-+New+reassignment+partition+logic+for+rebalancing > >>>> My guess is due to its non-trivial nature AND the number of cases one > >>>> needs to take care of - e.g. scaling up by 5% v/s scaling up by 50% in > >> say, > >>>> a 20 node cluster. > >>>> Furthermore, to be really effective, one needs to be cognizant of the > >>>> partition sizes, and with rack-awareness, the task becomes even more > >>>> involved. > >>>> > >>>> Regards, > >>>> Jayesh > >>>> > >>>> -----Original Message----- > >>>> From: Spico Florin [mailto:spicoflo...@gmail.com] > >>>> Sent: Tuesday, May 31, 2016 9:44 AM > >>>> To: users@kafka.apache.org > >>>> Subject: Re: Rebalancing issue while Kafka scaling > >>>> > >>>> Hi! > >>>> What version of Kafka you are using? What do you mean by "Kafka needs > >>>> rebalacing?" Rebalancing of what? Can you please be more specific. > >>>> > >>>> Regards, > >>>> Florin > >>>> > >>>> > >>>> > >>>> On Tue, May 31, 2016 at 4:58 PM, Hafsa Asif < > hafsa.a...@matchinguu.com> > >>>> wrote: > >>>> > >>>>> Hello Folks, > >>>>> > >>>>> Today , my team members shows concern that whenever we increase node > >>>>> in Kafka cluster, Kafka needs rebalancing. The rebalancing is sort of > >>>>> manual and not-good step whenever scaling happens. Second, if Kafka > >>>>> scales up then it cannot be scale down. Please provide us proper > >>>>> guidance over this issue, may be we have not enough configuration > >>>> properties. > >>>>> > >>>>> Hafsa > >>>>> > >>>> > >>>> > >>>> > >>>> > >>>> This email and any files included with it may contain privileged, > >>>> proprietary and/or confidential information that is for the sole use > >>>> of the intended recipient(s). Any disclosure, copying, distribution, > >>>> posting, or use of the information contained in or attached to this > >>>> email is prohibited unless permitted by the sender. If you have > >>>> received this email in error, please immediately notify the sender > >>>> via return email, telephone, or fax and destroy this original > >> transmission > >>>> and its included files without reading or saving it in any manner. > >>>> Thank you. > >>>> > >> > >> > >