Thanks Jun, As you said, rack-aware assignment needs to be supported during topic creation, adding of partitions, and partition re-assignment.
In order to reduce the risk of using a different value of max-rack-replication during different calls to those functions, I'd like to carry / persist that information as part of the topic config. This way the option is mandatory during topic creation, and optional partition re-assign. Here is my proposal: Topic Create: - max-rack-replication mandatory - store max-rack-replication as a topic config Partition Add: - Use the stored value of max-rack-replication to add another partition using the same max-rack-replication value as the rest of the partitions in the topic Partition Re-assign: - If max-rack-replication is provided, treat this as an altered config. This forces a re-validation of all replica assignments of all partitions for the topic. If this succeeds, save the updated topic config. - If max-rack-replication is not provided, used the saved value in the topic config. Alter Topic: - If someone over-rides the max-rack-replication value, we re-validate all replica assignments as with partition re-assignment. I think this strategy will be less error prone, making it impossible to have multiple partitions in a topic using different max-rack-replication values. Let me know if this sounds ok to you, Joris On Sun, Jan 26, 2014 at 9:15 AM, Jun Rao <jun...@gmail.com> wrote: > Replica assignments can change during topic creation, adding partitions, > and reassign partitions. So, it would be good to integrate rack-aware > assignment in those cases. > > For compatibility, to do a rolling upgrade of a Kafka cluster means that at > a given point of time, some brokers could be registered in the old format > and some others could be registered in the new format. Both the consumer > and the controller ( > > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Controller+Internals > ) > read the broker registration in ZK. So, we need to make sure that the new > code can read the old format and the old code can read the new format too. > For example, in Broker.createBroker(), if the code is looking for a field > "rackid" but couldn't find it, we need to be able to provide a reasonable > default. > > Also, take a look at the format of broker registration in ZK in > > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+data+structures+in+Zookeeper > . > Since we are changing the format, we should probably change the > version > from 1 to 2. However, we can make this change backward compatible. > > Thanks, > > Jun > > > On Fri, Jan 24, 2014 at 1:57 PM, Joris VanRemoortere < > jvanremoort...@tagged.com> wrote: > > > Working on the above. > > > > Since trunk currently allows Topic configs, and altering topics, does it > > make sense to you to make this a topic config so that it is kept in sync > > between topic alterations and partition adds? > > > > For compatibility: I can make the zookeeper state parsing backwards / > > forwards compatible. I don't think I can make the readFrom / writeTo > > ByteBuffer compatible since chained calls of these during api parsing are > > size dependent? Can you clarify exactly what you expect to be compatible? > > > > I'm also not very familiar with the consequences of changing the broker > > version in zookeeper registration. Can you please provide a reference to > > documentation for this? > > > > Thanks, > > > > Joris > > > > > > On Fri, Jan 24, 2014 at 9:43 AM, Jun Rao (JIRA) <j...@apache.org> wrote: > > > > > > > > [ > > > > > > https://issues.apache.org/jira/browse/KAFKA-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881193#comment-13881193 > > ] > > > > > > Jun Rao commented on KAFKA-1215: > > > -------------------------------- > > > > > > Thanks for the patch. Looks good over all. Some comments. > > > > > > 1. KafkaConfig: > > > 1.1 We need a config for default max-rack-replication for auto topic > > > creation. > > > 1.2 rackId: We probably don't want to make this a required property. > So, > > > perhaps we can default it to 0? > > > > > > 2. AdminUtils.assignReplicasToBrokers(): > > > 2.1 Could you add some comments on the rack-aware assignment algorithm? > > > 2.2 It's a bit weird for this method to take zkclient in the input. We > > > probably can pass in a list of Broker objects instead. > > > > > > 3. Unit tests: I suggest that we leave most existing tests intact by > > > keeping the rackId default and add a new test for rack-aware > assignment. > > > > > > 4. Compatibility test: It seems that the changes for the broker format > in > > > ZK is backward compatible. Could you double check? For example, an old > > > reader (controller, consumer, etc) should be able to parse the broker > > > registered in new format and a new reader should be able to parse the > > > broker registered in the old format. Also, we probably should increase > > the > > > version in the ZK registration for the broker. > > > > > > 5. Could you rebase to trunk? > > > > > > > Rack-Aware replica assignment option > > > > ------------------------------------ > > > > > > > > Key: KAFKA-1215 > > > > URL: > https://issues.apache.org/jira/browse/KAFKA-1215 > > > > Project: Kafka > > > > Issue Type: Improvement > > > > Components: replication > > > > Affects Versions: 0.8.0, 0.8.1 > > > > Reporter: Joris Van Remoortere > > > > Assignee: Neha Narkhede > > > > Fix For: 0.8.0, 0.8.1 > > > > > > > > Attachments: rack_aware_replica_assignment_v1.patch > > > > > > > > > > > > Adding a rack-id to kafka config. This rack-id can be used during > > > replica assignment by using the max-rack-replication argument in the > > admin > > > scripts (create topic, etc.). By default the original replication > > > assignment algorithm is used because max-rack-replication defaults to > -1. > > > max-rack-replication > -1 is not honored if you are doing manual > replica > > > assignment (preffered). > > > > If this looks good I can add some test cases specific to the > rack-aware > > > assignment. > > > > I can also port this to trunk. We are currently running 0.8.0 in > > > production and need this, so i wrote the patch against that. > > > > > > > > > > > > -- > > > This message was sent by Atlassian JIRA > > > (v6.1.5#6160) > > > > > >