[ https://issues.apache.org/jira/browse/KAFKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Abhishek Nigam reassigned KAFKA-1599: ------------------------------------- Assignee: Abhishek Nigam > Change preferred replica election admin command to handle large clusters > ------------------------------------------------------------------------ > > Key: KAFKA-1599 > URL: https://issues.apache.org/jira/browse/KAFKA-1599 > Project: Kafka > Issue Type: Improvement > Affects Versions: 0.8.2.0 > Reporter: Todd Palino > Assignee: Abhishek Nigam > Labels: newbie++ > > We ran into a problem with a cluster that has 70k partitions where we could > not trigger a preferred replica election for all topics and partitions using > the admin tool. Upon investigation, it was determined that this was because > the JSON object that was being written to the admin znode to tell the > controller to start the election was 1.8 MB in size. As the default Zookeeper > data size limit is 1MB, and it is non-trivial to change, we should come up > with a better way to represent the list of topics and partitions for this > admin command. > I have several thoughts on this so far: > 1) Trigger the command for all topics and partitions with a JSON object that > does not include an explicit list of them (i.e. a flag that says "all > partitions") > 2) Use a more compact JSON representation. Currently, the JSON contains a > 'partitions' key which holds a list of dictionaries that each have a 'topic' > and 'partition' key, and there must be one list item for each partition. This > results in a lot of repetition of key names that is unneeded. Changing this > to a format like this would be much more compact: > {'topics': {'topicName1': [0, 1, 2, 3], 'topicName2': [0,1]}, 'version': 1} > 3) Use a representation other than JSON. Strings are inefficient. A binary > format would be the most compact. This does put a greater burden on tools and > scripts that do not use the inbuilt libraries, but it is not too high. > 4) Use a representation that involves multiple znodes. A structured tree in > the admin command would probably provide the most complete solution. However, > we would need to make sure to not exceed the data size limit with a wide tree > (the list of children for any single znode cannot exceed the ZK data size of > 1MB) > Obviously, there could be a combination of #1 with a change in the > representation, which would likely be appropriate as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)