Hi Abhimanyu, I have answered your questions inline, but before that I just want to emphasize the notion of topics and partitions that are critical to Kafka's resiliency and scalability.
Topics in Kafka can have multiple partitions. Each partition can be stored on one broker only. But the number of partitions can grow over time to make topics scalable. Each topic partition can be configured to replicate on multiple brokers, and this is how data becomes resilient to failures and outages. You can find more detailed information in the highly recommended Kafka Documentation (https://kafka.apache.org/documentation/). I hope you find the answers below helpful. --Vahid On Thu, Dec 6, 2018 at 10:19 PM Abhimanyu Nagrath < abhimanyunagr...@gmail.com> wrote: > Hi, > > I have a use case I want to set up a Kafka cluster initially at the > starting I have 1 Kafka Broker(A) and 1 Zookeeper Node. So below mentioned > are my queries: > > - On adding a new Kafka Broker(B) to the cluster. Will all data present > on broker A will be distributed automatically? If not what I need to do > distribute the data. > When you add a new broker, existing data will not automatically move. In order to have the new broker receive data, existing partitions need to manually move to that broker. Kafka provides a command line tool for (re)assigning partitions to broker (kafka-reassign-partitions example). As new topic partitions are added to the cluster they will be distributed in a way that keeps all brokers busy. > - Not let's suppose somehow the case! is solved my data is distributed > on both the brokers. Now due to some maintenance issue, I want to take > down > the server B. > - How to transfer the data of Broker B to the already existing broker > A or to a new Broker C. > You can use the reassign partition tools again to achieve that. If broker B is going to join the cluster again, you may not need to do anything, assuming you have created your topics (partitions) with resiliency in mind (with enough replicas). Kafka will take care of partition movements for you. > - How can I increase the replication factor of my brokers at runtime > Again, you can use the same tool and increase the number of brokers assigned to each partition ( https://kafka.apache.org/documentation/#basic_ops_increase_replication_factor ) > - How can I change the zookeeper IPs present in Kafka Broker Config at > runtime without restarting Kafka? > This is not a supported operation. Ideally you are supporting your Kafka cluster with a ZooKeeper ensemble, that is resilient too to some failures and maintenance outages. > - How can I dynamically change the Kafka Configuration at runtime > Thanks to KIP-226 ( https://cwiki.apache.org/confluence/display/KAFKA/KIP-226+-+Dynamic+Broker+Configuration) there are some broker configurations that can be modified without a broker restart. Details are in that document. > - Regarding Kafka Client: > - Do I need to specify all Kafka broker IP to kafkaClient for > connection? > You need to provide enough broker IPs that guarantees the client can connect to at least one of them. As long as the client can talk to one broker it can obtain all the information it needs (by polling the metadata) to function. > - And each and every time a broker is added or removed does I need to > add or remove my IP in Kafka Client connection String. As it will > always > require to restart my producer and consumers? > Other than providing a few brokers, a more robust solution is to refresh the list of available brokers at runtime. A basic approach is querying ZooKeeper to compile a list of available brokers, to configure the client. > > *Note:* > > - Kafka Version: 2.0.0 > - Zookeeper: 3.4.9 > - Broker Size : (2 core, 8 GB RAM) [4GB for Kafka and 4 GB for OS] > > Regards, > Abhimanyu > -- Thanks! --Vahid