#2. /brokers/topics/[topic] stores the replica assignment for all partitions in the topic. /brokers/topics/[topic]/partitions/[partition_id]/state stores the leader/isr per partition. We did it this way since the leader/isr need to be updated on a per partition basis.
#4. Yes, what you observed is the correct behavior. #7. Right, the controller sends the topic metadata to the broker on startup. Every broker registers its id under /brokers/ids/. Thanks, Jun On Wed, Apr 8, 2015 at 3:01 AM, Jason Guo (jguo2) <jg...@cisco.com> wrote: > Thanks for your response and I have some other questions as below in > *green* > > > -----Original Message----- > From: Jun Rao [mailto:j...@confluent.io <j...@confluent.io>] > Sent: April 08, 2015 5:04 > To: us...@kafka.apache.org > Cc: dev@kafka.apache.org > Subject: Re: Is there a complete Kafka 0.8.* replication design document > > Yes, the wiki is a bit old. You can find out more about replication in the > following links. > http://kafka.apache.org/documentation.html#replication > http://www.slideshare.net/junrao/kafka-replication-apachecon2013 > > #1, #2, #8. See the ZK layout in > > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+data+structures+in+Zookeeper > *[Jason] I didn**’**t found answer for question #2. **We have stored > partition/replica/leader information in /brokers/topics/[topic], why we > still need /brokers/topics/[topic]/partitions/[**partition_id**]**/state* > > #3. Adding partitions is now done by updating /brokers/topics/[topic] > directly. > > #4. For deleting a topic, the ZK path > /admin/delete_topics/[topic_to_be_deleted] > is created and removed after the deletion completes. > *[Jason] From my observation, /admin/delete_tpoics will not be > automatically deleted. **Its child node can be automatically deleted* > > #5 LeaderAndISRCommand should be the same as LeaderAndISRRequest. > > #6 This is to take care of partitions that have been deleted while the > broker is down. The implementation doesn't rely on the special INIT flag. > Instead, it expects the very first LeaderAndISRRequest to include all > valid partitions. Local partitions not in that list will be deleted. > > #7 Only the controller needs to read the replica assignment. The > controller can be started before the broker registers itself. This will be > handled through ZK watchers. > *[Jason]** Do you mean the broker doesn't need to read replica assignment > anymore** when startup**? (If yes, this is different from the V3 wiki).* > *Another question is that does the broker need to add its id to > /brokers/ids/?* > > #9 The high level algorithm described there is still valid. For the > implementation, you can take a look at ReplicaManager. > > Thanks, > > Jun > > On Mon, Apr 6, 2015 at 7:51 PM, Jason Guo (jguo2) <jg...@cisco.com> wrote: > > > Hi, > > > > These days I have been focus on Kafka 0.8 replication design > > and found three replication design proposals from the wiki (according > > to the document, the V3 version is used in Kafka 0.8 release). > > But the v3 proposal is not complete and is inconsistent with > > the release. > > Is there a complete Kafka 0.8 replication design document? > > > > Here are part of my questions about Kafka 0.8 replication design. > > #1 According to V3, /brokers/topics/[topic]/[partition_id]/leaderAndISR > > stores leader and ISR of a partition. However in 0.8.2 release there > > is not such a znode, instead, it use > > /brokers/topics/[topic]/partitions/[partition_id]/state to store the > > leader and ISR of a partition. > > #2 In /brokers/topics/[topic], we can get all the ISR for all > > partitions in a certain topic, why we need > > /brokers/topics/[topic]/partitions/[partition_id]/state ? > > #3 I didn't find /admin/partitions_add/[topic]/[partition_id] and > > /admin/partitions_remove/[topic]/[partition_id] during my adding and > > removing partitions with bin/kafka-topics.sh. Is this deprecated in > > the 0.8 release? > > #4 I found these two znode under /admin only will be > > automaticall removed after the action complete. > > /admin/reassign_partitions/, /admin/preferred_replica_election/. But > > why this znode (/admin/delete_topic/) will not be removed automatically? > > #5 What's the LeaderAndISRCommand in Senario A in V3? Is that > > same with LeaderAndISRRequest? > > #6 For Senario D, when a certain broker becomes Controller, it > > will send a LeaderAndISRRequest to brokers with a special flag INIT. > > For Senario C, when the broker receive LeaderAndISRRequest with INIT > > flag, it will delete all local partitions not in set_p. Why we need to > > delete all local partitions for Controller changing? > > #7 For Senario E. Broker startup. The first step is read the > > replica assignment. Doesn't it need to add its id to /brokers/ids first? > > #8 Senario H. Add/remove partitions to an existing topic. In > > my test, I didn't found such znode for PartitionRemove > > Path/PartitionAdd Path in Zookeeper. Is this approach for partition > > adding/deleting deprecated? In fact, I didn't observe any znode change > > during my adding/deleting partitions. So what's the process of Kafka > > partition adding/deleting? > > #9 Senario G. seems not consistent with the release one > > > > > > > > Regards, > > Jason > > > > > > > > > > > > > >