Here is the entire logic to rebalance the cluster which is done by this groovy script ( https://github.com/Lowess/Kafka/blob/master/KafkaPartitionRebalancer.groovy)
#1: Request the zookeeper and get the broker id list #2: Request zookeeper and get the list of topic #3: Generate the topic-to-move.json which looks like: { "version": 1, "topics": [ { "topic": "SLOTS" }, { "topic": "ASSETS" }, { "topic": "AD_EVENTS" }, { "topic": "B_IMPRESSION" }, { "topic": "B_STATISTICS" }, { "topic": "PAGES" }, { "topic": "RTB" }, { "topic": "D_STATISTICS" }, { "topic": "D_REPORTING" } ] } #4: Upload this file on the kafka node (/tmp/topics-to-move.json) and run the following command: bin/kafka-reassign-partitions.sh --zookeeper ZK_IP:2181 --topics-to-move-json-file /tmp/topics-to-move.json --generate --broker-list "ALL_BROKERS_THAT_ARE_RETURNED_BY_ZOOKEEPER_ON_STEP_#1" #5: Parse the json returned by the previous step and slice it into smaller json (the number of partitions contained in a Json is limited by the groovy script (10 partitions in this example))that look like: { "version": 1, "partitions": [ { "topic": "D_REPORTING", "partition": 2, "replicas": [ 102311671, 10517222, 102311679 ] }, { "topic": "AD_EVENTS", "partition": 48, "replicas": [ 102311671, 109715277, 101531906 ] }, { "topic": "D_STATISTICS", "partition": 47, "replicas": [ 109715277, 10517222, 102311679 ] }, { "topic": "SLOTS", "partition": 46, "replicas": [ 101131445, 102336284, 10517222 ] }, { "topic": "RTB", "partition": 48, "replicas": [ 101021441, 102311671, 102336284 ] }, { "topic": "PAGES", "partition": 14, "replicas": [ 102311679, 102311671, 102336284 ] }, { "topic": "ASSETS", "partition": 35, "replicas": [ 10517222, 101131445, 102311679 ] }, { "topic": "B_IMPRESSION", "partition": 34, "replicas": [ 101131445, 102311672, 102311671 ] }, { "topic": "B_STATISTICS", "partition": 19, "replicas": [ 109715277, 101531906, 102311672 ] }, { "topic": "AD_EVENTS", "partition": 18, "replicas": [ 109715277, 102311671, 102336284 ] } ] } #6: Upload the previous Json on the kafka node (/tmp/expand-cluster-reassignment.json) and run the following command: bin/kafka-reassign-partitions.sh --zookeeper ZK_IP:2181 --reassignment-json-file /tmp/expand-cluster-reassignment.json --execute --broker-list "ALL_BROKERS_THAT_ARE_RETURNED_BY_ZOOKEEPER_ON_STEP_#1" #7: Loop on the verification step while the json returned by the following command contains failed partitions: bin/kafka-reassign-partitions.sh --zookeeper ZK_IP:2181 --reassignment-json-file /tmp/expand-cluster-reassignment.json --verify --broker-list "ALL_BROKERS_THAT_ARE_RETURNED_BY_ZOOKEEPER_ON_STEP_#1" #8 Execute a new json part file similar as step #5 util all of them have ran. Hope that will help you guys. On Tue, Jul 8, 2014 at 10:31 AM, Clark Haskins < chask...@linkedin.com.invalid> wrote: > Can you copy/paste the json you are passing to the reassignment tool? Plus > the commands. Also do a describe on your topics. > > -Clark > > Clark Elliott Haskins III > LinkedIn DDS Site Reliability Engineer > Kafka, Zookeeper, Samza SRE > Mobile: 505.385.1484 > BlueJeans: https://www.bluejeans.com/chaskins > > > chask...@linkedin.com > https://www.linkedin.com/in/clarkhaskins > There is no place like 127.0.0.1 > > > > > On 7/8/14, 10:26 AM, "Florian Dambrine" <flor...@gumgum.com> wrote: > > >I let the tool running for an entire weekend on the test cluster and on > >Monday it was still saying "failed"... > > > >I have 500 Go per Kafka node and it is a 8 nodes cluster. > > > >I am also wondering if I am using the tool correctly. Currently I am > >running the tool to rebalance everything across the entire cluster. As I > >have 3 replicas the tool requires at least 3 brokers. > > > >Should I add 3 new Kafka nodes and rebalance some topics to these new > >nodes > >only? I am afraid to unbalance the cluster with this option. > > > >Any suggestions? > > > >Thanks for your help. > > > > > >On Mon, Jul 7, 2014 at 9:29 PM, Jun Rao <jun...@gmail.com> wrote: > > > >> The failure could mean that the reassignment is still in progress. If > >>you > >> have lots of data, it may take some time to move the data to new > >>brokers. > >> You could observe the max lag in each broker to see how far behind new > >> replicas are (see > >>http://kafka.apache.org/documentation.html#monitoring). > >> > >> Thanks, > >> > >> Jun > >> > >> > >> On Mon, Jul 7, 2014 at 4:42 PM, Florian Dambrine <flor...@gumgum.com> > >> wrote: > >> > >> > When I run the tool with the --verify option it says failed for the > >>some > >> > partitions. > >> > > >> > The problem is I do not know if it is a zookeeper issue or if the tool > >> > really failed. > >> > > >> > I faced one time the zookeeper issue ( > >> > https://issues.apache.org/jira/browse/KAFKA-1382) and by killing the > >> > responsible Kafka the partition switched from failed to completed > >> > successfully. > >> > > >> > What should I do when the Kafka tool says that it failed to move the > >> > partition? > >> > > >> > > >> > > >> > > >> > On Mon, Jul 7, 2014 at 4:33 PM, Clark Haskins > >> > <chask...@linkedin.com.invalid > >> > > wrote: > >> > > >> > > How does it get stuck? > >> > > > >> > > -Clark > >> > > > >> > > Clark Elliott Haskins III > >> > > LinkedIn DDS Site Reliability Engineer > >> > > Kafka, Zookeeper, Samza SRE > >> > > Mobile: 505.385.1484 > >> > > BlueJeans: https://www.bluejeans.com/chaskins > >> > > > >> > > > >> > > chask...@linkedin.com > >> > > https://www.linkedin.com/in/clarkhaskins > >> > > There is no place like 127.0.0.1 > >> > > > >> > > > >> > > > >> > > > >> > > On 7/7/14, 3:49 PM, "Florian Dambrine" <flor...@gumgum.com> wrote: > >> > > > >> > > >Hi, > >> > > > > >> > > >I am trying to add new brokers to an existing 8 nodes Kafka > >>cluster. > >> We > >> > > >have around 10 topics and the number of partition is set to 50. In > >> order > >> > > >to > >> > > >test the reassgin-partitions scripts, I tried on a sandbox cluster > >>the > >> > > >following steps. > >> > > > > >> > > >I developed a script which is able to parse the reassignment > >>partition > >> > > >plan > >> > > >given by the Kafka tool in smaller pieces (reassigning maximum 10 > >> > > >partitions at a time). > >> > > > > >> > > >Unfortunately I faced some issues with the tool that sometimes get > >> stuck > >> > > >on > >> > > >one partition. In this case I have to kill and restart the three > >> Kafkas > >> > on > >> > > >which the partition has been relocated to unlock the process (One > >> kafka > >> > at > >> > > >a time). > >> > > > > >> > > >Moreover, I have also faced these two issues that are already on > >>Jira: > >> > > > > >> > > >https://issues.apache.org/jira/browse/KAFKA-1382 > >> > > >https://issues.apache.org/jira/browse/KAFKA-1479 > >> > > > > >> > > >We really need to add new nodes to our Kafka cluster, does anybody > >> have > >> > > >already rebalance a Kafka 0.8.1.1? What could you advise me? > >> > > > > >> > > >Thanks, and feel free to ask me if you need more details. > >> > > > > >> > > > > >> > > > > >> > > >-- > >> > > >*Florian Dambrine* | Intern, Big Data > >> > > >*GumGum* <http://www.gumgum.com/> | *Ads that stick* > >> > > >209-797-3994 | flor...@gumgum.com > >> > > > >> > > > >> > > >> > > >> > -- > >> > *Florian Dambrine* | Intern, Big Data > >> > *GumGum* <http://www.gumgum.com/> | *Ads that stick* > >> > 209-797-3994 | flor...@gumgum.com > >> > > >> > > > > > > > >-- > >*Florian Dambrine* | Intern, Big Data > >*GumGum* <http://www.gumgum.com/> | *Ads that stick* > >209-797-3994 | flor...@gumgum.com > > -- *Florian Dambrine* | Intern, Big Data *GumGum* <http://www.gumgum.com/> | *Ads that stick* 209-797-3994 | flor...@gumgum.com