Thanks Jun,

As you said, rack-aware assignment needs to be supported during topic
creation, adding of partitions, and partition re-assignment.

In order to reduce the risk of using a different value of
max-rack-replication during different calls to those functions, I'd like to
carry / persist that information as part of the topic config. This way the
option is mandatory during topic creation, and optional partition re-assign.

Here is my proposal:

Topic Create:

   - max-rack-replication mandatory
   - store max-rack-replication as a topic config

Partition Add:

   - Use the stored value of max-rack-replication to add another partition
   using the same max-rack-replication value as the rest of the partitions in
   the topic

Partition Re-assign:

   - If max-rack-replication is provided, treat this as an altered config.
   This forces a re-validation of all replica assignments of all partitions
   for the topic. If this succeeds, save the updated topic config.
   - If max-rack-replication is not provided, used the saved value in the
   topic config.

Alter Topic:

   - If someone over-rides the max-rack-replication value, we re-validate
   all replica assignments as with partition re-assignment.

I think this strategy will be less error prone, making it impossible to
have multiple partitions in a topic using different max-rack-replication
values. Let me know if this sounds ok to you,

Joris


On Sun, Jan 26, 2014 at 9:15 AM, Jun Rao <jun...@gmail.com> wrote:

> Replica assignments can change during topic creation, adding partitions,
> and reassign partitions. So, it would be good to integrate rack-aware
> assignment in those cases.
>
> For compatibility, to do a rolling upgrade of a Kafka cluster means that at
> a given point of time, some brokers could be registered in the old format
> and some others could be registered in the new format. Both the consumer
> and the controller (
>
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Controller+Internals
> )
> read the broker registration in ZK. So, we need to make sure that the new
> code can read the old format and the old code can read the new format too.
> For example, in Broker.createBroker(), if the code is looking for a field
> "rackid" but couldn't find it, we need to be able to provide a reasonable
> default.
>
> Also, take a look at the format of broker registration in ZK in
>
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+data+structures+in+Zookeeper
> .
> Since we are changing the format, we should probably change the
> version
> from 1 to 2. However, we can make this change backward compatible.
>
> Thanks,
>
> Jun
>
>
> On Fri, Jan 24, 2014 at 1:57 PM, Joris VanRemoortere <
> jvanremoort...@tagged.com> wrote:
>
> > Working on the above.
> >
> > Since trunk currently allows Topic configs, and altering topics, does it
> > make sense to you to make this a topic config so that it is kept in sync
> > between topic alterations and partition adds?
> >
> > For compatibility: I can make the zookeeper state parsing backwards /
> > forwards compatible. I don't think I can make the readFrom / writeTo
> > ByteBuffer compatible since chained calls of these during api parsing are
> > size dependent? Can you clarify exactly what you expect to be compatible?
> >
> > I'm also not very familiar with the consequences of changing the broker
> > version in zookeeper registration. Can you please provide a reference to
> > documentation for this?
> >
> > Thanks,
> >
> > Joris
> >
> >
> > On Fri, Jan 24, 2014 at 9:43 AM, Jun Rao (JIRA) <j...@apache.org> wrote:
> >
> > >
> > >     [
> > >
> >
> https://issues.apache.org/jira/browse/KAFKA-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881193#comment-13881193
> > ]
> > >
> > > Jun Rao commented on KAFKA-1215:
> > > --------------------------------
> > >
> > > Thanks for the patch. Looks good over all. Some comments.
> > >
> > > 1. KafkaConfig:
> > > 1.1 We need a config for default max-rack-replication for auto topic
> > > creation.
> > > 1.2 rackId: We probably don't want to make this a required property.
> So,
> > > perhaps we can default it to 0?
> > >
> > > 2. AdminUtils.assignReplicasToBrokers():
> > > 2.1 Could you add some comments on the rack-aware assignment algorithm?
> > > 2.2 It's a bit weird for this method to take zkclient in the input. We
> > > probably can pass in a list of Broker objects instead.
> > >
> > > 3. Unit tests: I suggest that we leave most existing tests intact by
> > > keeping the rackId default and add a new test for rack-aware
> assignment.
> > >
> > > 4. Compatibility test: It seems that the changes for the broker format
> in
> > > ZK is backward compatible. Could you double check? For example, an old
> > > reader (controller, consumer, etc) should be able to parse the broker
> > > registered in new format and a new reader should be able to parse the
> > > broker registered in the old format. Also, we probably should increase
> > the
> > > version in the ZK registration for the broker.
> > >
> > > 5. Could you rebase to trunk?
> > >
> > > > Rack-Aware replica assignment option
> > > > ------------------------------------
> > > >
> > > >                 Key: KAFKA-1215
> > > >                 URL:
> https://issues.apache.org/jira/browse/KAFKA-1215
> > > >             Project: Kafka
> > > >          Issue Type: Improvement
> > > >          Components: replication
> > > >    Affects Versions: 0.8.0, 0.8.1
> > > >            Reporter: Joris Van Remoortere
> > > >            Assignee: Neha Narkhede
> > > >             Fix For: 0.8.0, 0.8.1
> > > >
> > > >         Attachments: rack_aware_replica_assignment_v1.patch
> > > >
> > > >
> > > > Adding a rack-id to kafka config. This rack-id can be used during
> > > replica assignment by using the max-rack-replication argument in the
> > admin
> > > scripts (create topic, etc.). By default the original replication
> > > assignment algorithm is used because max-rack-replication defaults to
> -1.
> > > max-rack-replication > -1 is not honored if you are doing manual
> replica
> > > assignment (preffered).
> > > > If this looks good I can add some test cases specific to the
> rack-aware
> > > assignment.
> > > > I can also port this to trunk. We are currently running 0.8.0 in
> > > production and need this, so i wrote the patch against that.
> > >
> > >
> > >
> > > --
> > > This message was sent by Atlassian JIRA
> > > (v6.1.5#6160)
> > >
> >
>

Reply via email to