Updated KIP according to Jun's comment and included changes to TMR. On Tue, Jan 5, 2016 at 5:59 PM, Jun Rao <j...@confluent.io> wrote:
> Hi, Allen, > > A couple of minor comments on the KIP. > > 1. The version of the broker JSON string says 2. It should be 3. > > 2. The new version of UpdateMetadataRequest should be 2, instead of 1. > Could you include the full wire protocol of version 2 of > UpdateMetadataRequest and highlight the changed part? > > Thanks, > > Jun > > On Tue, Jan 5, 2016 at 3:11 PM, Allen Wang <allenxw...@gmail.com> wrote: > > > Jun and I had a chance to discuss it in a meeting and it is agreed to > > change the TMR in a different patch. > > > > I can change the KIP to include rack in TMR. The essential change is to > add > > rack into class BrokerEndPoint and make TMR version aware. > > > > > > > > On Tue, Jan 5, 2016 at 10:21 AM, Aditya Auradkar < > > aaurad...@linkedin.com.invalid> wrote: > > > > > Jun/Allen - > > > > > > Did we ever actually agree on whether we should evolve the TMR to > include > > > rack info or not? > > > I don't feel strongly about it but I if it's the right thing to do we > > > should probably do it in this KIP (can be a separate patch).. it isn't > a > > > large change. > > > > > > Aditya > > > > > > On Sat, Dec 26, 2015 at 3:01 PM, Allen Wang <allenxw...@gmail.com> > > wrote: > > > > > > > Added the rolling upgrade instruction in the KIP, similar to those in > > > 0.9.0 > > > > release notes. > > > > > > > > On Wed, Dec 16, 2015 at 11:32 AM, Allen Wang <allenxw...@gmail.com> > > > wrote: > > > > > > > > > Hi Jun, > > > > > > > > > > The reason that TopicMetadataResponse is not included in the KIP is > > > that > > > > > it currently is not version aware . So we need to introduce version > > to > > > it > > > > > in order to make sure backward compatibility. It seems to me a big > > > > change. > > > > > Do we want to couple it with this KIP? Do we need to further > discuss > > > what > > > > > information to include in the new version besides rack? For > example, > > > > should > > > > > we include broker security protocol in TopicMetadataResponse? > > > > > > > > > > The other option is to make it a separate KIP to make > > > > > TopicMetadataResponse version aware and decide what to include, and > > > make > > > > > this KIP focus on the rack aware algorithm, admin tools and > related > > > > > changes to inter-broker protocol . > > > > > > > > > > Thanks, > > > > > Allen > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Dec 14, 2015 at 8:30 AM, Jun Rao <j...@confluent.io> wrote: > > > > > > > > > >> Allen, > > > > >> > > > > >> Thanks for the proposal. A few comments. > > > > >> > > > > >> 1. Since this KIP changes the inter broker communication protocol > > > > >> (UpdateMetadataRequest), we will need to document the upgrade path > > > > >> (similar > > > > >> to what's described in > > > > >> http://kafka.apache.org/090/documentation.html#upgrade). > > > > >> > > > > >> 2. It might be useful to include the rack info of the broker in > > > > >> TopicMetadataResponse. This can be useful for administrative > tasks, > > as > > > > >> well > > > > >> as read affinity in the future. > > > > >> > > > > >> Jun > > > > >> > > > > >> > > > > >> > > > > >> On Thu, Dec 10, 2015 at 9:38 AM, Allen Wang <allenxw...@gmail.com > > > > > > wrote: > > > > >> > > > > >> > If there are no more comments I would like to call for a vote. > > > > >> > > > > > >> > > > > > >> > On Sun, Nov 15, 2015 at 10:08 PM, Allen Wang < > > allenxw...@gmail.com> > > > > >> wrote: > > > > >> > > > > > >> > > KIP is updated with more details and how to handle the > situation > > > > where > > > > >> > > rack information is incomplete. > > > > >> > > > > > > >> > > In the situation where rack information is incomplete, but we > > want > > > > to > > > > >> > > continue with the assignment, I have suggested to ignore all > > rack > > > > >> > > information and fallback to original algorithm. The reason is > > > > >> explained > > > > >> > > below: > > > > >> > > > > > > >> > > The other options are to assume that the broker without the > rack > > > > >> belong > > > > >> > to > > > > >> > > its own unique rack, or they belong to one "default" rack. > > Either > > > > way > > > > >> we > > > > >> > > choose, it is highly likely to result in uneven number of > > brokers > > > in > > > > >> > racks, > > > > >> > > and it is quite possible that the "made up" racks will have > much > > > > fewer > > > > >> > > number of brokers. As I explained in the KIP, uneven number of > > > > >> brokers in > > > > >> > > racks will lead to uneven distribution of replicas among > brokers > > > > (even > > > > >> > > though the leader distribution is still even). The brokers in > > the > > > > rack > > > > >> > that > > > > >> > > has fewer number of brokers will get more replicas per broker > > than > > > > >> > brokers > > > > >> > > in other racks. > > > > >> > > > > > > >> > > Given this fact and the replica assignment produced will be > > > > incorrect > > > > >> > > anyway from rack aware point of view, ignoring all rack > > > information > > > > >> and > > > > >> > > fallback to the original algorithm is not a bad choice since > it > > > will > > > > >> at > > > > >> > > least have a better guarantee of replica distribution. > > > > >> > > > > > > >> > > Also for command line tools it gives user a choice if for any > > > reason > > > > >> they > > > > >> > > want to ignore rack information and fallback to the original > > > > >> algorithm. > > > > >> > > > > > > >> > > > > > > >> > > On Tue, Nov 10, 2015 at 9:04 AM, Allen Wang < > > allenxw...@gmail.com > > > > > > > > >> > wrote: > > > > >> > > > > > > >> > >> I am busy with some time pressing issues for the last few > > days. I > > > > >> will > > > > >> > >> think about how the incomplete rack information will affect > the > > > > >> balance > > > > >> > and > > > > >> > >> update the KIP by early next week. > > > > >> > >> > > > > >> > >> Thanks, > > > > >> > >> Allen > > > > >> > >> > > > > >> > >> > > > > >> > >> On Tue, Nov 3, 2015 at 9:03 AM, Neha Narkhede < > > n...@confluent.io > > > > > > > > >> > wrote: > > > > >> > >> > > > > >> > >>> Few suggestions on improving the KIP > > > > >> > >>> > > > > >> > >>> *If some brokers have rack, and some do not, the algorithm > > will > > > > >> thrown > > > > >> > an > > > > >> > >>> > exception. This is to prevent incorrect assignment caused > by > > > > user > > > > >> > >>> error.* > > > > >> > >>> > > > > >> > >>> > > > > >> > >>> In the KIP, can you clearly state the user-facing behavior > > when > > > > some > > > > >> > >>> brokers have rack information and some don't. Which actions > > and > > > > >> > requests > > > > >> > >>> will error out and how? > > > > >> > >>> > > > > >> > >>> *Even distribution of partition leadership among brokers* > > > > >> > >>> > > > > >> > >>> > > > > >> > >>> There is some information about arranging the sorted broker > > list > > > > >> > >>> interlaced > > > > >> > >>> with rack ids. Can you describe the changes to the current > > > > algorithm > > > > >> > in a > > > > >> > >>> little more detail? How does this interlacing work if only a > > > > subset > > > > >> of > > > > >> > >>> brokers have the rack id configured? Does this still work if > > > > uneven > > > > >> # > > > > >> > of > > > > >> > >>> brokers are assigned to each rack? It might work, I'm > looking > > > for > > > > >> more > > > > >> > >>> details on the changes, since it will affect the behavior > seen > > > by > > > > >> the > > > > >> > >>> user > > > > >> > >>> - imbalance on either the leaders or data or both. > > > > >> > >>> > > > > >> > >>> On Mon, Nov 2, 2015 at 6:39 PM, Aditya Auradkar < > > > > >> > aaurad...@linkedin.com> > > > > >> > >>> wrote: > > > > >> > >>> > > > > >> > >>> > I think this sounds reasonable. Anyone else have comments? > > > > >> > >>> > > > > > >> > >>> > Aditya > > > > >> > >>> > > > > > >> > >>> > On Tue, Oct 27, 2015 at 5:23 PM, Allen Wang < > > > > allenxw...@gmail.com > > > > >> > > > > > >> > >>> wrote: > > > > >> > >>> > > > > > >> > >>> > > During the discussion in the hangout, it was mentioned > > that > > > it > > > > >> > would > > > > >> > >>> be > > > > >> > >>> > > desirable that consumers know the rack information of > the > > > > >> brokers > > > > >> > so > > > > >> > >>> that > > > > >> > >>> > > they can consume from the broker in the same rack to > > reduce > > > > >> > latency. > > > > >> > >>> As I > > > > >> > >>> > > understand this will only be beneficial if consumer can > > > > consume > > > > >> > from > > > > >> > >>> any > > > > >> > >>> > > broker in ISR, which is not possible now. > > > > >> > >>> > > > > > > >> > >>> > > I suggest we skip the change to TMR. Once the change is > > made > > > > to > > > > >> > >>> consumer > > > > >> > >>> > to > > > > >> > >>> > > be able to consume from any broker in ISR, the rack > > > > information > > > > >> can > > > > >> > >>> be > > > > >> > >>> > > added to TMR. > > > > >> > >>> > > > > > > >> > >>> > > Another thing I want to confirm is command line > > behavior. I > > > > >> think > > > > >> > >>> the > > > > >> > >>> > > desirable default behavior is to fail fast on command > line > > > for > > > > >> > >>> incomplete > > > > >> > >>> > > rack mapping. The error message can include further > > > > instruction > > > > >> > that > > > > >> > >>> > tells > > > > >> > >>> > > the user to add an extra argument (like > > > > >> "--allow-partial-rackinfo") > > > > >> > >>> to > > > > >> > >>> > > suppress the error and do an imperfect rack aware > > > assignment. > > > > If > > > > >> > the > > > > >> > >>> > > default behavior is to allow incomplete mapping, the > error > > > can > > > > >> > still > > > > >> > >>> be > > > > >> > >>> > > easily missed. > > > > >> > >>> > > > > > > >> > >>> > > The affected command line tools are TopicCommand and > > > > >> > >>> > > ReassignPartitionsCommand. > > > > >> > >>> > > > > > > >> > >>> > > Thanks, > > > > >> > >>> > > Allen > > > > >> > >>> > > > > > > >> > >>> > > > > > > >> > >>> > > > > > > >> > >>> > > > > > > >> > >>> > > > > > > >> > >>> > > On Mon, Oct 26, 2015 at 12:55 PM, Aditya Auradkar < > > > > >> > >>> > aaurad...@linkedin.com> > > > > >> > >>> > > wrote: > > > > >> > >>> > > > > > > >> > >>> > > > Hi Allen, > > > > >> > >>> > > > > > > > >> > >>> > > > For TopicMetadataResponse to understand version, you > can > > > > bump > > > > >> up > > > > >> > >>> the > > > > >> > >>> > > > request version itself. Based on the version of the > > > request, > > > > >> the > > > > >> > >>> > response > > > > >> > >>> > > > can be appropriately serialized. It shouldn't be a > huge > > > > >> change. > > > > >> > For > > > > >> > >>> > > > example: We went through something similar for > > > > ProduceRequest > > > > >> > >>> recently > > > > >> > >>> > ( > > > > >> > >>> > > > https://reviews.apache.org/r/33378/) > > > > >> > >>> > > > I guess the reason protocol information is not > included > > in > > > > the > > > > >> > TMR > > > > >> > >>> is > > > > >> > >>> > > > because the topic itself is independent of any > > particular > > > > >> > protocol > > > > >> > >>> (SSL > > > > >> > >>> > > vs > > > > >> > >>> > > > Plaintext). Having said that, I'm not sure we even > need > > > rack > > > > >> > >>> > information > > > > >> > >>> > > in > > > > >> > >>> > > > TMR. What usecase were you thinking of initially? > > > > >> > >>> > > > > > > > >> > >>> > > > For 1 - I'd be fine with adding an option to the > command > > > > line > > > > >> > tools > > > > >> > >>> > that > > > > >> > >>> > > > check rack assignment. For e.g. "--strict-assignment" > or > > > > >> > something > > > > >> > >>> > > similar. > > > > >> > >>> > > > > > > > >> > >>> > > > Aditya > > > > >> > >>> > > > > > > > >> > >>> > > > On Thu, Oct 22, 2015 at 6:44 PM, Allen Wang < > > > > >> > allenxw...@gmail.com> > > > > >> > >>> > > wrote: > > > > >> > >>> > > > > > > > >> > >>> > > > > For 2 and 3, I have updated the KIP. Please take a > > look. > > > > One > > > > >> > >>> thing I > > > > >> > >>> > > have > > > > >> > >>> > > > > changed is removing the proposal to add rack to > > > > >> > >>> > TopicMetadataResponse. > > > > >> > >>> > > > The > > > > >> > >>> > > > > reason is that unlike UpdateMetadataRequest, > > > > >> > >>> TopicMetadataResponse > > > > >> > >>> > does > > > > >> > >>> > > > not > > > > >> > >>> > > > > understand version. I don't see a way to include > rack > > > > >> without > > > > >> > >>> > breaking > > > > >> > >>> > > > old > > > > >> > >>> > > > > version of clients. That's probably why secure > > protocol > > > is > > > > >> not > > > > >> > >>> > included > > > > >> > >>> > > > in > > > > >> > >>> > > > > the TopicMetadataResponse either. I think it will > be a > > > > much > > > > >> > >>> bigger > > > > >> > >>> > > change > > > > >> > >>> > > > > to include rack in TopicMetadataResponse. > > > > >> > >>> > > > > > > > > >> > >>> > > > > For 1, my concern is that doing rack aware > assignment > > > > >> without > > > > >> > >>> > complete > > > > >> > >>> > > > > broker to rack mapping will result in assignment > that > > is > > > > not > > > > >> > rack > > > > >> > >>> > aware > > > > >> > >>> > > > and > > > > >> > >>> > > > > fail to provide fault tolerance in the event of rack > > > > outage. > > > > >> > This > > > > >> > >>> > kind > > > > >> > >>> > > of > > > > >> > >>> > > > > problem will be difficult to surface. And the cost > of > > > this > > > > >> > >>> problem is > > > > >> > >>> > > > high: > > > > >> > >>> > > > > you have to do partition reassignment if you are > lucky > > > to > > > > >> spot > > > > >> > >>> the > > > > >> > >>> > > > problem > > > > >> > >>> > > > > early on or face the consequence of data loss during > > > real > > > > >> rack > > > > >> > >>> > outage. > > > > >> > >>> > > > > > > > > >> > >>> > > > > I do see the concern of fail-fast as it might also > > cause > > > > >> data > > > > >> > >>> loss if > > > > >> > >>> > > > > producer is not able produce the message due to > topic > > > > >> creation > > > > >> > >>> > failure. > > > > >> > >>> > > > Is > > > > >> > >>> > > > > it feasible to treat dynamic topic creation and > > command > > > > >> tools > > > > >> > >>> > > > differently? > > > > >> > >>> > > > > We allow dynamic topic creation with incomplete > > > > broker-rack > > > > >> > >>> mapping > > > > >> > >>> > and > > > > >> > >>> > > > > fail fast in command line. Another option is to let > > user > > > > >> > >>> determine > > > > >> > >>> > the > > > > >> > >>> > > > > behavior for command line. For example, by default > > fail > > > > >> fast in > > > > >> > >>> > command > > > > >> > >>> > > > > line but allow incomplete broker-rack mapping if > > another > > > > >> switch > > > > >> > >>> is > > > > >> > >>> > > > > provided. > > > > >> > >>> > > > > > > > > >> > >>> > > > > > > > > >> > >>> > > > > > > > > >> > >>> > > > > > > > > >> > >>> > > > > On Tue, Oct 20, 2015 at 10:05 AM, Aditya Auradkar < > > > > >> > >>> > > > > aaurad...@linkedin.com.invalid> wrote: > > > > >> > >>> > > > > > > > > >> > >>> > > > > > Hey Allen, > > > > >> > >>> > > > > > > > > > >> > >>> > > > > > 1. If we choose fail fast topic creation, we will > > have > > > > >> topic > > > > >> > >>> > creation > > > > >> > >>> > > > > > failures while upgrading the cluster. I really > doubt > > > we > > > > >> want > > > > >> > >>> this > > > > >> > >>> > > > > behavior. > > > > >> > >>> > > > > > Ideally, this should be invisible to clients of a > > > > cluster. > > > > >> > >>> > Currently, > > > > >> > >>> > > > > each > > > > >> > >>> > > > > > broker is effectively its own rack. So we probably > > can > > > > use > > > > >> > the > > > > >> > >>> rack > > > > >> > >>> > > > > > information whenever possible but not make it a > hard > > > > >> > >>> requirement. > > > > >> > >>> > To > > > > >> > >>> > > > > extend > > > > >> > >>> > > > > > Gwen's example, one badly configured broker should > > not > > > > >> > degrade > > > > >> > >>> > topic > > > > >> > >>> > > > > > creation for the entire cluster. > > > > >> > >>> > > > > > > > > > >> > >>> > > > > > 2. Upgrade scenario - Can you add a section on the > > > > upgrade > > > > >> > >>> piece to > > > > >> > >>> > > > > confirm > > > > >> > >>> > > > > > that old clients will not see errors? I believe > > > > >> > >>> > > > > ZookeeperConsumerConnector > > > > >> > >>> > > > > > reads the Broker objects from ZK. I wanted to > > confirm > > > > that > > > > >> > this > > > > >> > >>> > will > > > > >> > >>> > > > not > > > > >> > >>> > > > > > cause any problems. > > > > >> > >>> > > > > > > > > > >> > >>> > > > > > 3. Could you elaborate your proposed changes to > the > > > > >> > >>> > > > UpdateMetadataRequest > > > > >> > >>> > > > > > in the "Public Interfaces" section? Personally, I > > find > > > > >> this > > > > >> > >>> format > > > > >> > >>> > > easy > > > > >> > >>> > > > > to > > > > >> > >>> > > > > > read in terms of wire protocol changes: > > > > >> > >>> > > > > > > > > > >> > >>> > > > > > > > > > >> > >>> > > > > > > > > >> > >>> > > > > > > > >> > >>> > > > > > > >> > >>> > > > > > >> > >>> > > > > >> > > > > > >> > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest > > > > >> > >>> > > > > > > > > > >> > >>> > > > > > Aditya > > > > >> > >>> > > > > > > > > > >> > >>> > > > > > On Fri, Oct 16, 2015 at 3:45 PM, Allen Wang < > > > > >> > >>> allenxw...@gmail.com> > > > > >> > >>> > > > > wrote: > > > > >> > >>> > > > > > > > > > >> > >>> > > > > > > KIP is updated include rack as an optional > > property > > > > for > > > > >> > >>> broker. > > > > >> > >>> > > > Please > > > > >> > >>> > > > > > take > > > > >> > >>> > > > > > > a look and let me know if more details are > needed. > > > > >> > >>> > > > > > > > > > > >> > >>> > > > > > > For the case where some brokers have rack and > some > > > do > > > > >> not, > > > > >> > >>> the > > > > >> > >>> > > > current > > > > >> > >>> > > > > > KIP > > > > >> > >>> > > > > > > uses the fail-fast behavior. If there are > > concerns, > > > we > > > > >> can > > > > >> > >>> > further > > > > >> > >>> > > > > > discuss > > > > >> > >>> > > > > > > this in the email thread or next hangout. > > > > >> > >>> > > > > > > > > > > >> > >>> > > > > > > > > > > >> > >>> > > > > > > > > > > >> > >>> > > > > > > On Thu, Oct 15, 2015 at 10:42 AM, Allen Wang < > > > > >> > >>> > allenxw...@gmail.com > > > > >> > >>> > > > > > > > >> > >>> > > > > > wrote: > > > > >> > >>> > > > > > > > > > > >> > >>> > > > > > > > That's a good question. I can think of three > > > actions > > > > >> if > > > > >> > the > > > > >> > >>> > rack > > > > >> > >>> > > > > > > > information is incomplete: > > > > >> > >>> > > > > > > > > > > > >> > >>> > > > > > > > 1. Treat the node without rack as if it is on > > its > > > > >> unique > > > > >> > >>> rack > > > > >> > >>> > > > > > > > 2. Disregard all rack information and fallback > > to > > > > >> current > > > > >> > >>> > > algorithm > > > > >> > >>> > > > > > > > 3. Fail-fast > > > > >> > >>> > > > > > > > > > > > >> > >>> > > > > > > > Now I think about it, one and three make more > > > sense. > > > > >> The > > > > >> > >>> reason > > > > >> > >>> > > for > > > > >> > >>> > > > > > > > fail-fast is that user mistake for not > providing > > > the > > > > >> rack > > > > >> > >>> may > > > > >> > >>> > > never > > > > >> > >>> > > > > be > > > > >> > >>> > > > > > > > found if we tolerate that and the assignment > may > > > not > > > > >> be > > > > >> > >>> rack > > > > >> > >>> > > aware > > > > >> > >>> > > > as > > > > >> > >>> > > > > > the > > > > >> > >>> > > > > > > > user has expected and this creates debug > > problems > > > > when > > > > >> > >>> things > > > > >> > >>> > > fail. > > > > >> > >>> > > > > > > > > > > > >> > >>> > > > > > > > What do you think? If not fail-fast, is there > > > anyway > > > > >> we > > > > >> > can > > > > >> > >>> > make > > > > >> > >>> > > > the > > > > >> > >>> > > > > > user > > > > >> > >>> > > > > > > > error standing out? > > > > >> > >>> > > > > > > > > > > > >> > >>> > > > > > > > > > > > >> > >>> > > > > > > > On Thu, Oct 15, 2015 at 10:17 AM, Gwen > Shapira < > > > > >> > >>> > > g...@confluent.io> > > > > >> > >>> > > > > > > wrote: > > > > >> > >>> > > > > > > > > > > > >> > >>> > > > > > > >> Thanks! Just to clarify, when some brokers > have > > > > rack > > > > >> > >>> > assignment > > > > >> > >>> > > > and > > > > >> > >>> > > > > > some > > > > >> > >>> > > > > > > >> don't, do we act like none of them have it? > or > > > like > > > > >> > those > > > > >> > >>> > > without > > > > >> > >>> > > > > > > >> assignment are in their own rack? > > > > >> > >>> > > > > > > >> > > > > >> > >>> > > > > > > >> The first scenario is good when first setting > > up > > > > >> > >>> > rack-awareness, > > > > >> > >>> > > > but > > > > >> > >>> > > > > > the > > > > >> > >>> > > > > > > >> second makes more sense for on-going > > maintenance > > > (I > > > > >> can > > > > >> > >>> > totally > > > > >> > >>> > > > see > > > > >> > >>> > > > > > > >> someone > > > > >> > >>> > > > > > > >> adding a node and forgetting to set the rack > > > > >> property, > > > > >> > we > > > > >> > >>> > don't > > > > >> > >>> > > > want > > > > >> > >>> > > > > > > this > > > > >> > >>> > > > > > > >> to change behavior for anything except the > new > > > > node). > > > > >> > >>> > > > > > > >> > > > > >> > >>> > > > > > > >> What do you think? > > > > >> > >>> > > > > > > >> > > > > >> > >>> > > > > > > >> Gwen > > > > >> > >>> > > > > > > >> > > > > >> > >>> > > > > > > >> On Thu, Oct 15, 2015 at 10:13 AM, Allen Wang > < > > > > >> > >>> > > > allenxw...@gmail.com> > > > > >> > >>> > > > > > > >> wrote: > > > > >> > >>> > > > > > > >> > > > > >> > >>> > > > > > > >> > For scenario 1: > > > > >> > >>> > > > > > > >> > > > > > >> > >>> > > > > > > >> > - Add the rack information to broker > property > > > > file > > > > >> or > > > > >> > >>> > > > dynamically > > > > >> > >>> > > > > > set > > > > >> > >>> > > > > > > >> it in > > > > >> > >>> > > > > > > >> > the wrapper code to bootstrap Kafka server. > > You > > > > >> would > > > > >> > do > > > > >> > >>> > that > > > > >> > >>> > > > for > > > > >> > >>> > > > > > all > > > > >> > >>> > > > > > > >> > brokers and restart the brokers one by one. > > > > >> > >>> > > > > > > >> > > > > > >> > >>> > > > > > > >> > In this scenario, the complete broker to > rack > > > > >> mapping > > > > >> > >>> may > > > > >> > >>> > not > > > > >> > >>> > > be > > > > >> > >>> > > > > > > >> available > > > > >> > >>> > > > > > > >> > until every broker is restarted. During > that > > > time > > > > >> we > > > > >> > >>> fall > > > > >> > >>> > back > > > > >> > >>> > > > to > > > > >> > >>> > > > > > > >> default > > > > >> > >>> > > > > > > >> > replica assignment algorithm. > > > > >> > >>> > > > > > > >> > > > > > >> > >>> > > > > > > >> > For scenario 2: > > > > >> > >>> > > > > > > >> > > > > > >> > >>> > > > > > > >> > - Add the rack information to broker > property > > > > file > > > > >> or > > > > >> > >>> > > > dynamically > > > > >> > >>> > > > > > set > > > > >> > >>> > > > > > > >> it in > > > > >> > >>> > > > > > > >> > the wrapper code and start the broker. > > > > >> > >>> > > > > > > >> > > > > > >> > >>> > > > > > > >> > > > > > >> > >>> > > > > > > >> > On Wed, Oct 14, 2015 at 2:36 PM, Gwen > > Shapira < > > > > >> > >>> > > > g...@confluent.io> > > > > >> > >>> > > > > > > >> wrote: > > > > >> > >>> > > > > > > >> > > > > > >> > >>> > > > > > > >> > > Can you clarify the workflow for the > > > following > > > > >> > >>> scenarios: > > > > >> > >>> > > > > > > >> > > > > > > >> > >>> > > > > > > >> > > 1. I currently have 6 brokers and want to > > add > > > > >> rack > > > > >> > >>> > > information > > > > >> > >>> > > > > for > > > > >> > >>> > > > > > > >> each > > > > >> > >>> > > > > > > >> > > 2. I'm adding a new broker and I want to > > > > specify > > > > >> > which > > > > >> > >>> > rack > > > > >> > >>> > > it > > > > >> > >>> > > > > > > >> belongs on > > > > >> > >>> > > > > > > >> > > while adding it. > > > > >> > >>> > > > > > > >> > > > > > > >> > >>> > > > > > > >> > > Thanks! > > > > >> > >>> > > > > > > >> > > > > > > >> > >>> > > > > > > >> > > On Tue, Oct 13, 2015 at 2:21 PM, Allen > > Wang < > > > > >> > >>> > > > > allenxw...@gmail.com > > > > >> > >>> > > > > > > > > > > >> > >>> > > > > > > >> > wrote: > > > > >> > >>> > > > > > > >> > > > > > > >> > >>> > > > > > > >> > > > We discussed the KIP in the hangout > > today. > > > > The > > > > >> > >>> > > > recommendation > > > > >> > >>> > > > > is > > > > >> > >>> > > > > > > to > > > > >> > >>> > > > > > > >> > make > > > > >> > >>> > > > > > > >> > > > rack as a broker property in ZooKeeper. > > For > > > > >> users > > > > >> > >>> with > > > > >> > >>> > > > > existing > > > > >> > >>> > > > > > > rack > > > > >> > >>> > > > > > > >> > > > information stored somewhere, they > would > > > need > > > > >> to > > > > >> > >>> > retrieve > > > > >> > >>> > > > the > > > > >> > >>> > > > > > > >> > information > > > > >> > >>> > > > > > > >> > > > at broker start up and dynamically set > > the > > > > rack > > > > >> > >>> > property, > > > > >> > >>> > > > > which > > > > >> > >>> > > > > > > can > > > > >> > >>> > > > > > > >> be > > > > >> > >>> > > > > > > >> > > > implemented as a wrapper to bootstrap > > > broker. > > > > >> > There > > > > >> > >>> will > > > > >> > >>> > > be > > > > >> > >>> > > > no > > > > >> > >>> > > > > > > >> > interface > > > > >> > >>> > > > > > > >> > > or > > > > >> > >>> > > > > > > >> > > > pluggable implementation to retrieve > the > > > rack > > > > >> > >>> > information. > > > > >> > >>> > > > > > > >> > > > > > > > >> > >>> > > > > > > >> > > > The assumption is that you always need > to > > > > >> restart > > > > >> > >>> the > > > > >> > >>> > > broker > > > > >> > >>> > > > > to > > > > >> > >>> > > > > > > >> make a > > > > >> > >>> > > > > > > >> > > > change to the rack. > > > > >> > >>> > > > > > > >> > > > > > > > >> > >>> > > > > > > >> > > > Once the rack becomes a broker > property, > > it > > > > >> will > > > > >> > be > > > > >> > >>> > > possible > > > > >> > >>> > > > > to > > > > >> > >>> > > > > > > make > > > > >> > >>> > > > > > > >> > rack > > > > >> > >>> > > > > > > >> > > > part of the meta data to help the > > consumer > > > > >> choose > > > > >> > >>> which > > > > >> > >>> > in > > > > >> > >>> > > > > sync > > > > >> > >>> > > > > > > >> replica > > > > >> > >>> > > > > > > >> > > to > > > > >> > >>> > > > > > > >> > > > consume from as part of the future > > consumer > > > > >> > >>> enhancement. > > > > >> > >>> > > > > > > >> > > > > > > > >> > >>> > > > > > > >> > > > I will update the KIP. > > > > >> > >>> > > > > > > >> > > > > > > > >> > >>> > > > > > > >> > > > Thanks, > > > > >> > >>> > > > > > > >> > > > Allen > > > > >> > >>> > > > > > > >> > > > > > > > >> > >>> > > > > > > >> > > > > > > > >> > >>> > > > > > > >> > > > On Thu, Oct 8, 2015 at 9:23 AM, Allen > > Wang > > > < > > > > >> > >>> > > > > > allenxw...@gmail.com> > > > > >> > >>> > > > > > > >> > wrote: > > > > >> > >>> > > > > > > >> > > > > > > > >> > >>> > > > > > > >> > > > > I attended Tuesday's KIP hangout but > > this > > > > KIP > > > > >> > was > > > > >> > >>> not > > > > >> > >>> > > > > > discussed > > > > >> > >>> > > > > > > >> due > > > > >> > >>> > > > > > > >> > to > > > > >> > >>> > > > > > > >> > > > > time constraint. > > > > >> > >>> > > > > > > >> > > > > > > > > >> > >>> > > > > > > >> > > > > However, after hearing discussion of > > > > KIP-35, > > > > >> I > > > > >> > >>> have > > > > >> > >>> > the > > > > >> > >>> > > > > > feeling > > > > >> > >>> > > > > > > >> that > > > > >> > >>> > > > > > > >> > > > > incompatibility (caused by new broker > > > > >> property) > > > > >> > >>> > between > > > > >> > >>> > > > > > brokers > > > > >> > >>> > > > > > > >> with > > > > >> > >>> > > > > > > >> > > > > different versions will be solved > > there. > > > > In > > > > >> > >>> addition, > > > > >> > >>> > > > > having > > > > >> > >>> > > > > > > >> stack > > > > >> > >>> > > > > > > >> > in > > > > >> > >>> > > > > > > >> > > > > broker property as meta data may also > > > help > > > > >> > >>> consumers > > > > >> > >>> > in > > > > >> > >>> > > > the > > > > >> > >>> > > > > > > >> future. > > > > >> > >>> > > > > > > >> > So > > > > >> > >>> > > > > > > >> > > I > > > > >> > >>> > > > > > > >> > > > am > > > > >> > >>> > > > > > > >> > > > > open to adding stack property to > > broker. > > > > >> > >>> > > > > > > >> > > > > > > > > >> > >>> > > > > > > >> > > > > Hopefully we can discuss this in the > > next > > > > KIP > > > > >> > >>> hangout. > > > > >> > >>> > > > > > > >> > > > > > > > > >> > >>> > > > > > > >> > > > > On Wed, Sep 30, 2015 at 2:46 PM, > Allen > > > > Wang < > > > > >> > >>> > > > > > > allenxw...@gmail.com > > > > >> > >>> > > > > > > >> > > > > > >> > >>> > > > > > > >> > > > wrote: > > > > >> > >>> > > > > > > >> > > > > > > > > >> > >>> > > > > > > >> > > > >> Can you send me the information on > the > > > > next > > > > >> KIP > > > > >> > >>> > > hangout? > > > > >> > >>> > > > > > > >> > > > >> > > > > >> > >>> > > > > > > >> > > > >> Currently the broker-rack mapping is > > not > > > > >> > cached. > > > > >> > >>> In > > > > >> > >>> > > > > > KafkaApis, > > > > >> > >>> > > > > > > >> > > > >> RackLocator.getRackInfo() is called > > each > > > > >> time > > > > >> > the > > > > >> > >>> > > mapping > > > > >> > >>> > > > > is > > > > >> > >>> > > > > > > >> needed > > > > >> > >>> > > > > > > >> > > for > > > > >> > >>> > > > > > > >> > > > >> auto topic creation. This will > ensure > > > > latest > > > > >> > >>> mapping > > > > >> > >>> > is > > > > >> > >>> > > > > used > > > > >> > >>> > > > > > at > > > > >> > >>> > > > > > > >> any > > > > >> > >>> > > > > > > >> > > > time. > > > > >> > >>> > > > > > > >> > > > >> > > > > >> > >>> > > > > > > >> > > > >> The ability to get the complete > > mapping > > > > >> makes > > > > >> > it > > > > >> > >>> > simple > > > > >> > >>> > > > to > > > > >> > >>> > > > > > > reuse > > > > >> > >>> > > > > > > >> the > > > > >> > >>> > > > > > > >> > > > same > > > > >> > >>> > > > > > > >> > > > >> interface in command line tools. > > > > >> > >>> > > > > > > >> > > > >> > > > > >> > >>> > > > > > > >> > > > >> > > > > >> > >>> > > > > > > >> > > > >> On Wed, Sep 30, 2015 at 11:01 AM, > > Aditya > > > > >> > >>> Auradkar < > > > > >> > >>> > > > > > > >> > > > >> aaurad...@linkedin.com.invalid> > > wrote: > > > > >> > >>> > > > > > > >> > > > >> > > > > >> > >>> > > > > > > >> > > > >>> Perhaps we discuss this during the > > next > > > > KIP > > > > >> > >>> hangout? > > > > >> > >>> > > > > > > >> > > > >>> > > > > >> > >>> > > > > > > >> > > > >>> I do see that a pluggable rack > > locator > > > > can > > > > >> be > > > > >> > >>> useful > > > > >> > >>> > > > but I > > > > >> > >>> > > > > > do > > > > >> > >>> > > > > > > >> see a > > > > >> > >>> > > > > > > >> > > few > > > > >> > >>> > > > > > > >> > > > >>> concerns: > > > > >> > >>> > > > > > > >> > > > >>> > > > > >> > >>> > > > > > > >> > > > >>> - The RackLocator (as described in > > the > > > > >> > >>> document), > > > > >> > >>> > > > implies > > > > >> > >>> > > > > > that > > > > >> > >>> > > > > > > >> it > > > > >> > >>> > > > > > > >> > can > > > > >> > >>> > > > > > > >> > > > >>> discover rack information for any > > node > > > in > > > > >> the > > > > >> > >>> > cluster. > > > > >> > >>> > > > How > > > > >> > >>> > > > > > > does > > > > >> > >>> > > > > > > >> it > > > > >> > >>> > > > > > > >> > > deal > > > > >> > >>> > > > > > > >> > > > >>> with rack location changes? For > > > example, > > > > >> if I > > > > >> > >>> moved > > > > >> > >>> > > > broker > > > > >> > >>> > > > > > id > > > > >> > >>> > > > > > > >> (1) > > > > >> > >>> > > > > > > >> > > from > > > > >> > >>> > > > > > > >> > > > >>> rack > > > > >> > >>> > > > > > > >> > > > >>> X to Y, I only have to start that > > > broker > > > > >> with > > > > >> > a > > > > >> > >>> > newer > > > > >> > >>> > > > rack > > > > >> > >>> > > > > > > >> config. > > > > >> > >>> > > > > > > >> > If > > > > >> > >>> > > > > > > >> > > > >>> RackLocator discovers broker -> > rack > > > > >> > >>> information at > > > > >> > >>> > > > start > > > > >> > >>> > > > > up > > > > >> > >>> > > > > > > >> time, > > > > >> > >>> > > > > > > >> > > any > > > > >> > >>> > > > > > > >> > > > >>> change to a broker will require > > > bouncing > > > > >> the > > > > >> > >>> entire > > > > >> > >>> > > > > cluster > > > > >> > >>> > > > > > > >> since > > > > >> > >>> > > > > > > >> > > > >>> createTopic requests can be sent to > > any > > > > >> node > > > > >> > in > > > > >> > >>> the > > > > >> > >>> > > > > cluster. > > > > >> > >>> > > > > > > >> > > > >>> For this reason it may be simpler > to > > > have > > > > >> each > > > > >> > >>> node > > > > >> > >>> > be > > > > >> > >>> > > > > aware > > > > >> > >>> > > > > > > of > > > > >> > >>> > > > > > > >> its > > > > >> > >>> > > > > > > >> > > own > > > > >> > >>> > > > > > > >> > > > >>> rack and persist it in ZK during > > start > > > up > > > > >> > time. > > > > >> > >>> > > > > > > >> > > > >>> > > > > >> > >>> > > > > > > >> > > > >>> - A pluggable RackLocator relies on > > an > > > > >> > external > > > > >> > >>> > > service > > > > >> > >>> > > > > > being > > > > >> > >>> > > > > > > >> > > available > > > > >> > >>> > > > > > > >> > > > >>> to > > > > >> > >>> > > > > > > >> > > > >>> serve rack information. > > > > >> > >>> > > > > > > >> > > > >>> > > > > >> > >>> > > > > > > >> > > > >>> Out of curiosity, I looked up how a > > > > couple > > > > >> of > > > > >> > >>> other > > > > >> > >>> > > > > systems > > > > >> > >>> > > > > > > deal > > > > >> > >>> > > > > > > >> > with > > > > >> > >>> > > > > > > >> > > > >>> zone/rack awareness. > > > > >> > >>> > > > > > > >> > > > >>> For Cassandra some interesting > modes > > > are: > > > > >> > >>> > > > > > > >> > > > >>> (Property File configuration) > > > > >> > >>> > > > > > > >> > > > >>> > > > > >> > >>> > > > > > > >> > > > >>> > > > > >> > >>> > > > > > > >> > > > > > > > >> > >>> > > > > > > >> > > > > > > >> > >>> > > > > > > >> > > > > > >> > >>> > > > > > > >> > > > > >> > >>> > > > > > > > > > > >> > >>> > > > > > > > > > >> > >>> > > > > > > > > >> > >>> > > > > > > > >> > >>> > > > > > > >> > >>> > > > > > >> > >>> > > > > >> > > > > > >> > > > > > > > > > > http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html > > > > >> > >>> > > > > > > >> > > > >>> (Dynamic inference) > > > > >> > >>> > > > > > > >> > > > >>> > > > > >> > >>> > > > > > > >> > > > >>> > > > > >> > >>> > > > > > > >> > > > > > > > >> > >>> > > > > > > >> > > > > > > >> > >>> > > > > > > >> > > > > > >> > >>> > > > > > > >> > > > > >> > >>> > > > > > > > > > > >> > >>> > > > > > > > > > >> > >>> > > > > > > > > >> > >>> > > > > > > > >> > >>> > > > > > > >> > >>> > > > > > >> > >>> > > > > >> > > > > > >> > > > > > > > > > > http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html > > > > >> > >>> > > > > > > >> > > > >>> > > > > >> > >>> > > > > > > >> > > > >>> Voldemort does a static node -> > zone > > > > >> > assignment > > > > >> > >>> > based > > > > >> > >>> > > on > > > > >> > >>> > > > > > > >> > > configuration. > > > > >> > >>> > > > > > > >> > > > >>> > > > > >> > >>> > > > > > > >> > > > >>> Aditya > > > > >> > >>> > > > > > > >> > > > >>> > > > > >> > >>> > > > > > > >> > > > >>> On Wed, Sep 30, 2015 at 10:05 AM, > > Allen > > > > >> Wang < > > > > >> > >>> > > > > > > >> allenxw...@gmail.com > > > > >> > >>> > > > > > > >> > > > > > > >> > >>> > > > > > > >> > > > >>> wrote: > > > > >> > >>> > > > > > > >> > > > >>> > > > > >> > >>> > > > > > > >> > > > >>> > I would like to see if we can do > > > both: > > > > >> > >>> > > > > > > >> > > > >>> > > > > > >> > >>> > > > > > > >> > > > >>> > - Make RackLocator pluggable to > > > > >> facilitate > > > > >> > >>> > migration > > > > >> > >>> > > > > with > > > > >> > >>> > > > > > > >> > existing > > > > >> > >>> > > > > > > >> > > > >>> > broker-rack mapping > > > > >> > >>> > > > > > > >> > > > >>> > > > > > >> > >>> > > > > > > >> > > > >>> > - Make rack an optional property > > for > > > > >> broker. > > > > >> > >>> If > > > > >> > >>> > rack > > > > >> > >>> > > > is > > > > >> > >>> > > > > > > >> available > > > > >> > >>> > > > > > > >> > > > from > > > > >> > >>> > > > > > > >> > > > >>> > broker, treat it as source of > > truth. > > > > For > > > > >> > users > > > > >> > >>> > with > > > > >> > >>> > > > > > existing > > > > >> > >>> > > > > > > >> > > > >>> broker-rack > > > > >> > >>> > > > > > > >> > > > >>> > mapping somewhere else, they can > > use > > > > the > > > > >> > >>> pluggable > > > > >> > >>> > > way > > > > >> > >>> > > > > or > > > > >> > >>> > > > > > > they > > > > >> > >>> > > > > > > >> > can > > > > >> > >>> > > > > > > >> > > > >>> transfer > > > > >> > >>> > > > > > > >> > > > >>> > the mapping to the broker rack > > > > property. > > > > >> > >>> > > > > > > >> > > > >>> > > > > > >> > >>> > > > > > > >> > > > >>> > One thing I am not sure is what > > > happens > > > > >> at > > > > >> > >>> rolling > > > > >> > >>> > > > > upgrade > > > > >> > >>> > > > > > > >> when > > > > >> > >>> > > > > > > >> > we > > > > >> > >>> > > > > > > >> > > > have > > > > >> > >>> > > > > > > >> > > > >>> > rack as a broker property. For > > > brokers > > > > >> with > > > > >> > >>> older > > > > >> > >>> > > > > version > > > > >> > >>> > > > > > of > > > > >> > >>> > > > > > > >> > Kafka, > > > > >> > >>> > > > > > > >> > > > >>> will it > > > > >> > >>> > > > > > > >> > > > >>> > cause problem for them? If so, is > > > there > > > > >> any > > > > >> > >>> > > > workaround? > > > > >> > >>> > > > > I > > > > >> > >>> > > > > > > also > > > > >> > >>> > > > > > > >> > > think > > > > >> > >>> > > > > > > >> > > > it > > > > >> > >>> > > > > > > >> > > > >>> > would be better not to have rack > in > > > the > > > > >> > >>> controller > > > > >> > >>> > > > wire > > > > >> > >>> > > > > > > >> protocol > > > > >> > >>> > > > > > > >> > > but > > > > >> > >>> > > > > > > >> > > > >>> not > > > > >> > >>> > > > > > > >> > > > >>> > sure if it is achievable. > > > > >> > >>> > > > > > > >> > > > >>> > > > > > >> > >>> > > > > > > >> > > > >>> > Thanks, > > > > >> > >>> > > > > > > >> > > > >>> > Allen > > > > >> > >>> > > > > > > >> > > > >>> > > > > > >> > >>> > > > > > > >> > > > >>> > > > > > >> > >>> > > > > > > >> > > > >>> > > > > > >> > >>> > > > > > > >> > > > >>> > > > > > >> > >>> > > > > > > >> > > > >>> > > > > > >> > >>> > > > > > > >> > > > >>> > On Mon, Sep 28, 2015 at 4:55 PM, > > Todd > > > > >> > Palino < > > > > >> > >>> > > > > > > >> tpal...@gmail.com> > > > > >> > >>> > > > > > > >> > > > >>> wrote: > > > > >> > >>> > > > > > > >> > > > >>> > > > > > >> > >>> > > > > > > >> > > > >>> > > I tend to like the idea of a > > > > pluggable > > > > >> > >>> locator. > > > > >> > >>> > > For > > > > >> > >>> > > > > > > >> example, we > > > > >> > >>> > > > > > > >> > > > >>> already > > > > >> > >>> > > > > > > >> > > > >>> > > have an interface for > discovering > > > > >> > >>> information > > > > >> > >>> > > about > > > > >> > >>> > > > > the > > > > >> > >>> > > > > > > >> > physical > > > > >> > >>> > > > > > > >> > > > >>> location > > > > >> > >>> > > > > > > >> > > > >>> > > of servers. I don't relish the > > idea > > > > of > > > > >> > >>> having to > > > > >> > >>> > > > > > maintain > > > > >> > >>> > > > > > > >> data > > > > >> > >>> > > > > > > >> > in > > > > >> > >>> > > > > > > >> > > > >>> > multiple > > > > >> > >>> > > > > > > >> > > > >>> > > places. > > > > >> > >>> > > > > > > >> > > > >>> > > > > > > >> > >>> > > > > > > >> > > > >>> > > -Todd > > > > >> > >>> > > > > > > >> > > > >>> > > > > > > >> > >>> > > > > > > >> > > > >>> > > On Mon, Sep 28, 2015 at 4:48 > PM, > > > > Aditya > > > > >> > >>> > Auradkar < > > > > >> > >>> > > > > > > >> > > > >>> > > aaurad...@linkedin.com.invalid > > > > > > wrote: > > > > >> > >>> > > > > > > >> > > > >>> > > > > > > >> > >>> > > > > > > >> > > > >>> > > > Thanks for starting this KIP > > > Allen. > > > > >> > >>> > > > > > > >> > > > >>> > > > > > > > >> > >>> > > > > > > >> > > > >>> > > > I agree with Gwen that > having a > > > > >> > >>> RackLocator > > > > >> > >>> > > class > > > > >> > >>> > > > > that > > > > >> > >>> > > > > > > is > > > > >> > >>> > > > > > > >> > > > pluggable > > > > >> > >>> > > > > > > >> > > > >>> > seems > > > > >> > >>> > > > > > > >> > > > >>> > > > to be too complex. The KIP > > refers > > > > to > > > > >> > >>> > potentially > > > > >> > >>> > > > > > non-ZK > > > > >> > >>> > > > > > > >> > storage > > > > >> > >>> > > > > > > >> > > > >>> for the > > > > >> > >>> > > > > > > >> > > > >>> > > > rack info which I don't think > > is > > > > >> > >>> necessary. > > > > >> > >>> > > > > > > >> > > > >>> > > > > > > > >> > >>> > > > > > > >> > > > >>> > > > Perhaps we can persist this > > info > > > in > > > > >> zk > > > > >> > >>> under > > > > >> > >>> > > > > > > >> > > > >>> /brokers/ids/<broker_id> > > > > >> > >>> > > > > > > >> > > > >>> > > > similar to other broker > > > properties > > > > >> and > > > > >> > >>> add a > > > > >> > >>> > > > config > > > > >> > >>> > > > > in > > > > >> > >>> > > > > > > >> > > > KafkaConfig > > > > >> > >>> > > > > > > >> > > > >>> > called > > > > >> > >>> > > > > > > >> > > > >>> > > > "rack". > > > > >> > >>> > > > > > > >> > > > >>> > > > > > > > >> > >>> > > > > > > > > > > >> {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy, > > > > >> > >>> > > > > > > >> > > "rack": > > > > >> > >>> > > > > > > >> > > > >>> > "abc"} > > > > >> > >>> > > > > > > >> > > > >>> > > > > > > > >> > >>> > > > > > > >> > > > >>> > > > Aditya > > > > >> > >>> > > > > > > >> > > > >>> > > > > > > > >> > >>> > > > > > > >> > > > >>> > > > On Mon, Sep 28, 2015 at 2:30 > > PM, > > > > Gwen > > > > >> > >>> Shapira > > > > >> > >>> > < > > > > >> > >>> > > > > > > >> > > g...@confluent.io > > > > >> > >>> > > > > > > >> > > > > > > > > >> > >>> > > > > > > >> > > > >>> > wrote: > > > > >> > >>> > > > > > > >> > > > >>> > > > > > > > >> > >>> > > > > > > >> > > > >>> > > > > Hi, > > > > >> > >>> > > > > > > >> > > > >>> > > > > > > > > >> > >>> > > > > > > >> > > > >>> > > > > First, thanks for putting > > out a > > > > KIP > > > > >> > for > > > > >> > >>> > this. > > > > >> > >>> > > > This > > > > >> > >>> > > > > > is > > > > >> > >>> > > > > > > >> super > > > > >> > >>> > > > > > > >> > > > >>> important > > > > >> > >>> > > > > > > >> > > > >>> > > for > > > > >> > >>> > > > > > > >> > > > >>> > > > > production deployments of > > > Kafka. > > > > >> > >>> > > > > > > >> > > > >>> > > > > > > > > >> > >>> > > > > > > >> > > > >>> > > > > Few questions: > > > > >> > >>> > > > > > > >> > > > >>> > > > > > > > > >> > >>> > > > > > > >> > > > >>> > > > > 1) Are we sure we want "as > > many > > > > >> racks > > > > >> > as > > > > >> > >>> > > > > possible"? > > > > >> > >>> > > > > > > I'd > > > > >> > >>> > > > > > > >> > want > > > > >> > >>> > > > > > > >> > > to > > > > >> > >>> > > > > > > >> > > > >>> > balance > > > > >> > >>> > > > > > > >> > > > >>> > > > > between safety (more racks) > > and > > > > >> > network > > > > >> > >>> > > > > utilization > > > > >> > >>> > > > > > > >> > (traffic > > > > >> > >>> > > > > > > >> > > > >>> within a > > > > >> > >>> > > > > > > >> > > > >>> > > > rack > > > > >> > >>> > > > > > > >> > > > >>> > > > > uses the high-bandwidth TOR > > > > >> switch). > > > > >> > One > > > > >> > >>> > > replica > > > > >> > >>> > > > > on > > > > >> > >>> > > > > > a > > > > >> > >>> > > > > > > >> > > different > > > > >> > >>> > > > > > > >> > > > >>> rack > > > > >> > >>> > > > > > > >> > > > >>> > > and > > > > >> > >>> > > > > > > >> > > > >>> > > > > the rest on same rack (if > > > > possible) > > > > >> > >>> sounds > > > > >> > >>> > > > better > > > > >> > >>> > > > > to > > > > >> > >>> > > > > > > me. > > > > >> > >>> > > > > > > >> > > > >>> > > > > > > > > >> > >>> > > > > > > >> > > > >>> > > > > 2) Rack-locator class seems > > > > overly > > > > >> > >>> complex > > > > >> > >>> > > > > compared > > > > >> > >>> > > > > > to > > > > >> > >>> > > > > > > >> > > adding a > > > > >> > >>> > > > > > > >> > > > >>> > > > rack.number > > > > >> > >>> > > > > > > >> > > > >>> > > > > property to the broker > > > properties > > > > >> > file. > > > > >> > >>> Why > > > > >> > >>> > do > > > > >> > >>> > > > we > > > > >> > >>> > > > > > want > > > > >> > >>> > > > > > > >> > that? > > > > >> > >>> > > > > > > >> > > > >>> > > > > > > > > >> > >>> > > > > > > >> > > > >>> > > > > Gwen > > > > >> > >>> > > > > > > >> > > > >>> > > > > > > > > >> > >>> > > > > > > >> > > > >>> > > > > > > > > >> > >>> > > > > > > >> > > > >>> > > > > > > > > >> > >>> > > > > > > >> > > > >>> > > > > On Mon, Sep 28, 2015 at > 12:15 > > > PM, > > > > >> > Allen > > > > >> > >>> > Wang < > > > > >> > >>> > > > > > > >> > > > >>> allenxw...@gmail.com> > > > > >> > >>> > > > > > > >> > > > >>> > > > wrote: > > > > >> > >>> > > > > > > >> > > > >>> > > > > > > > > >> > >>> > > > > > > >> > > > >>> > > > > > Hello Kafka Developers, > > > > >> > >>> > > > > > > >> > > > >>> > > > > > > > > > >> > >>> > > > > > > >> > > > >>> > > > > > I just created KIP-36 for > > > rack > > > > >> aware > > > > >> > >>> > replica > > > > >> > >>> > > > > > > >> assignment. > > > > >> > >>> > > > > > > >> > > > >>> > > > > > > > > > >> > >>> > > > > > > >> > > > >>> > > > > > > > > > >> > >>> > > > > > > >> > > > >>> > > > > > > > > > >> > >>> > > > > > > >> > > > >>> > > > > > > > > >> > >>> > > > > > > >> > > > >>> > > > > > > > >> > >>> > > > > > > >> > > > >>> > > > > > > >> > >>> > > > > > > >> > > > >>> > > > > > >> > >>> > > > > > > >> > > > >>> > > > > >> > >>> > > > > > > >> > > > > > > > >> > >>> > > > > > > >> > > > > > > >> > >>> > > > > > > >> > > > > > >> > >>> > > > > > > >> > > > > >> > >>> > > > > > > > > > > >> > >>> > > > > > > > > > >> > >>> > > > > > > > > >> > >>> > > > > > > > >> > >>> > > > > > > >> > >>> > > > > > >> > >>> > > > > >> > > > > > >> > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment > > > > >> > >>> > > > > > > >> > > > >>> > > > > > > > > > >> > >>> > > > > > > >> > > > >>> > > > > > The goal is to utilize > the > > > > >> isolation > > > > >> > >>> > > provided > > > > >> > >>> > > > by > > > > >> > >>> > > > > > the > > > > >> > >>> > > > > > > >> > racks > > > > >> > >>> > > > > > > >> > > in > > > > >> > >>> > > > > > > >> > > > >>> data > > > > >> > >>> > > > > > > >> > > > >>> > > > center > > > > >> > >>> > > > > > > >> > > > >>> > > > > > and distribute replicas > to > > > > racks > > > > >> to > > > > >> > >>> > provide > > > > >> > >>> > > > > fault > > > > >> > >>> > > > > > > >> > > tolerance. > > > > >> > >>> > > > > > > >> > > > >>> > > > > > > > > > >> > >>> > > > > > > >> > > > >>> > > > > > Comments are welcome. > > > > >> > >>> > > > > > > >> > > > >>> > > > > > > > > > >> > >>> > > > > > > >> > > > >>> > > > > > Thanks, > > > > >> > >>> > > > > > > >> > > > >>> > > > > > Allen > > > > >> > >>> > > > > > > >> > > > >>> > > > > > > > > > >> > >>> > > > > > > >> > > > >>> > > > > > > > > >> > >>> > > > > > > >> > > > >>> > > > > > > > >> > >>> > > > > > > >> > > > >>> > > > > > > >> > >>> > > > > > > >> > > > >>> > > > > > >> > >>> > > > > > > >> > > > >>> > > > > >> > >>> > > > > > > >> > > > >> > > > > >> > >>> > > > > > > >> > > > >> > > > > >> > >>> > > > > > > >> > > > > > > > > >> > >>> > > > > > > >> > > > > > > > >> > >>> > > > > > > >> > > > > > > >> > >>> > > > > > > >> > > > > > >> > >>> > > > > > > >> > > > > >> > >>> > > > > > > > > > > > >> > >>> > > > > > > > > > > > >> > >>> > > > > > > > > > > >> > >>> > > > > > > > > > >> > >>> > > > > > > > > >> > >>> > > > > > > > >> > >>> > > > > > > >> > >>> > > > > > >> > >>> > > > > >> > >>> > > > > >> > >>> > > > > >> > >>> -- > > > > >> > >>> Thanks, > > > > >> > >>> Neha > > > > >> > >>> > > > > >> > >> > > > > >> > >> > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > > > > > > > > > > >