That's a good question. I can think of three actions if the rack information is incomplete:
1. Treat the node without rack as if it is on its unique rack 2. Disregard all rack information and fallback to current algorithm 3. Fail-fast Now I think about it, one and three make more sense. The reason for fail-fast is that user mistake for not providing the rack may never be found if we tolerate that and the assignment may not be rack aware as the user has expected and this creates debug problems when things fail. What do you think? If not fail-fast, is there anyway we can make the user error standing out? On Thu, Oct 15, 2015 at 10:17 AM, Gwen Shapira <g...@confluent.io> wrote: > Thanks! Just to clarify, when some brokers have rack assignment and some > don't, do we act like none of them have it? or like those without > assignment are in their own rack? > > The first scenario is good when first setting up rack-awareness, but the > second makes more sense for on-going maintenance (I can totally see someone > adding a node and forgetting to set the rack property, we don't want this > to change behavior for anything except the new node). > > What do you think? > > Gwen > > On Thu, Oct 15, 2015 at 10:13 AM, Allen Wang <allenxw...@gmail.com> wrote: > > > For scenario 1: > > > > - Add the rack information to broker property file or dynamically set it > in > > the wrapper code to bootstrap Kafka server. You would do that for all > > brokers and restart the brokers one by one. > > > > In this scenario, the complete broker to rack mapping may not be > available > > until every broker is restarted. During that time we fall back to default > > replica assignment algorithm. > > > > For scenario 2: > > > > - Add the rack information to broker property file or dynamically set it > in > > the wrapper code and start the broker. > > > > > > On Wed, Oct 14, 2015 at 2:36 PM, Gwen Shapira <g...@confluent.io> wrote: > > > > > Can you clarify the workflow for the following scenarios: > > > > > > 1. I currently have 6 brokers and want to add rack information for each > > > 2. I'm adding a new broker and I want to specify which rack it belongs > on > > > while adding it. > > > > > > Thanks! > > > > > > On Tue, Oct 13, 2015 at 2:21 PM, Allen Wang <allenxw...@gmail.com> > > wrote: > > > > > > > We discussed the KIP in the hangout today. The recommendation is to > > make > > > > rack as a broker property in ZooKeeper. For users with existing rack > > > > information stored somewhere, they would need to retrieve the > > information > > > > at broker start up and dynamically set the rack property, which can > be > > > > implemented as a wrapper to bootstrap broker. There will be no > > interface > > > or > > > > pluggable implementation to retrieve the rack information. > > > > > > > > The assumption is that you always need to restart the broker to make > a > > > > change to the rack. > > > > > > > > Once the rack becomes a broker property, it will be possible to make > > rack > > > > part of the meta data to help the consumer choose which in sync > replica > > > to > > > > consume from as part of the future consumer enhancement. > > > > > > > > I will update the KIP. > > > > > > > > Thanks, > > > > Allen > > > > > > > > > > > > On Thu, Oct 8, 2015 at 9:23 AM, Allen Wang <allenxw...@gmail.com> > > wrote: > > > > > > > > > I attended Tuesday's KIP hangout but this KIP was not discussed due > > to > > > > > time constraint. > > > > > > > > > > However, after hearing discussion of KIP-35, I have the feeling > that > > > > > incompatibility (caused by new broker property) between brokers > with > > > > > different versions will be solved there. In addition, having stack > > in > > > > > broker property as meta data may also help consumers in the future. > > So > > > I > > > > am > > > > > open to adding stack property to broker. > > > > > > > > > > Hopefully we can discuss this in the next KIP hangout. > > > > > > > > > > On Wed, Sep 30, 2015 at 2:46 PM, Allen Wang <allenxw...@gmail.com> > > > > wrote: > > > > > > > > > >> Can you send me the information on the next KIP hangout? > > > > >> > > > > >> Currently the broker-rack mapping is not cached. In KafkaApis, > > > > >> RackLocator.getRackInfo() is called each time the mapping is > needed > > > for > > > > >> auto topic creation. This will ensure latest mapping is used at > any > > > > time. > > > > >> > > > > >> The ability to get the complete mapping makes it simple to reuse > the > > > > same > > > > >> interface in command line tools. > > > > >> > > > > >> > > > > >> On Wed, Sep 30, 2015 at 11:01 AM, Aditya Auradkar < > > > > >> aaurad...@linkedin.com.invalid> wrote: > > > > >> > > > > >>> Perhaps we discuss this during the next KIP hangout? > > > > >>> > > > > >>> I do see that a pluggable rack locator can be useful but I do > see a > > > few > > > > >>> concerns: > > > > >>> > > > > >>> - The RackLocator (as described in the document), implies that it > > can > > > > >>> discover rack information for any node in the cluster. How does > it > > > deal > > > > >>> with rack location changes? For example, if I moved broker id (1) > > > from > > > > >>> rack > > > > >>> X to Y, I only have to start that broker with a newer rack > config. > > If > > > > >>> RackLocator discovers broker -> rack information at start up > time, > > > any > > > > >>> change to a broker will require bouncing the entire cluster since > > > > >>> createTopic requests can be sent to any node in the cluster. > > > > >>> For this reason it may be simpler to have each node be aware of > its > > > own > > > > >>> rack and persist it in ZK during start up time. > > > > >>> > > > > >>> - A pluggable RackLocator relies on an external service being > > > available > > > > >>> to > > > > >>> serve rack information. > > > > >>> > > > > >>> Out of curiosity, I looked up how a couple of other systems deal > > with > > > > >>> zone/rack awareness. > > > > >>> For Cassandra some interesting modes are: > > > > >>> (Property File configuration) > > > > >>> > > > > >>> > > > > > > > > > > http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html > > > > >>> (Dynamic inference) > > > > >>> > > > > >>> > > > > > > > > > > http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html > > > > >>> > > > > >>> Voldemort does a static node -> zone assignment based on > > > configuration. > > > > >>> > > > > >>> Aditya > > > > >>> > > > > >>> On Wed, Sep 30, 2015 at 10:05 AM, Allen Wang < > allenxw...@gmail.com > > > > > > > >>> wrote: > > > > >>> > > > > >>> > I would like to see if we can do both: > > > > >>> > > > > > >>> > - Make RackLocator pluggable to facilitate migration with > > existing > > > > >>> > broker-rack mapping > > > > >>> > > > > > >>> > - Make rack an optional property for broker. If rack is > available > > > > from > > > > >>> > broker, treat it as source of truth. For users with existing > > > > >>> broker-rack > > > > >>> > mapping somewhere else, they can use the pluggable way or they > > can > > > > >>> transfer > > > > >>> > the mapping to the broker rack property. > > > > >>> > > > > > >>> > One thing I am not sure is what happens at rolling upgrade when > > we > > > > have > > > > >>> > rack as a broker property. For brokers with older version of > > Kafka, > > > > >>> will it > > > > >>> > cause problem for them? If so, is there any workaround? I also > > > think > > > > it > > > > >>> > would be better not to have rack in the controller wire > protocol > > > but > > > > >>> not > > > > >>> > sure if it is achievable. > > > > >>> > > > > > >>> > Thanks, > > > > >>> > Allen > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> > On Mon, Sep 28, 2015 at 4:55 PM, Todd Palino < > tpal...@gmail.com> > > > > >>> wrote: > > > > >>> > > > > > >>> > > I tend to like the idea of a pluggable locator. For example, > we > > > > >>> already > > > > >>> > > have an interface for discovering information about the > > physical > > > > >>> location > > > > >>> > > of servers. I don't relish the idea of having to maintain > data > > in > > > > >>> > multiple > > > > >>> > > places. > > > > >>> > > > > > > >>> > > -Todd > > > > >>> > > > > > > >>> > > On Mon, Sep 28, 2015 at 4:48 PM, Aditya Auradkar < > > > > >>> > > aaurad...@linkedin.com.invalid> wrote: > > > > >>> > > > > > > >>> > > > Thanks for starting this KIP Allen. > > > > >>> > > > > > > > >>> > > > I agree with Gwen that having a RackLocator class that is > > > > pluggable > > > > >>> > seems > > > > >>> > > > to be too complex. The KIP refers to potentially non-ZK > > storage > > > > >>> for the > > > > >>> > > > rack info which I don't think is necessary. > > > > >>> > > > > > > > >>> > > > Perhaps we can persist this info in zk under > > > > >>> /brokers/ids/<broker_id> > > > > >>> > > > similar to other broker properties and add a config in > > > > KafkaConfig > > > > >>> > called > > > > >>> > > > "rack". > > > > >>> > > > {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy, > > > "rack": > > > > >>> > "abc"} > > > > >>> > > > > > > > >>> > > > Aditya > > > > >>> > > > > > > > >>> > > > On Mon, Sep 28, 2015 at 2:30 PM, Gwen Shapira < > > > g...@confluent.io > > > > > > > > > >>> > wrote: > > > > >>> > > > > > > > >>> > > > > Hi, > > > > >>> > > > > > > > > >>> > > > > First, thanks for putting out a KIP for this. This is > super > > > > >>> important > > > > >>> > > for > > > > >>> > > > > production deployments of Kafka. > > > > >>> > > > > > > > > >>> > > > > Few questions: > > > > >>> > > > > > > > > >>> > > > > 1) Are we sure we want "as many racks as possible"? I'd > > want > > > to > > > > >>> > balance > > > > >>> > > > > between safety (more racks) and network utilization > > (traffic > > > > >>> within a > > > > >>> > > > rack > > > > >>> > > > > uses the high-bandwidth TOR switch). One replica on a > > > different > > > > >>> rack > > > > >>> > > and > > > > >>> > > > > the rest on same rack (if possible) sounds better to me. > > > > >>> > > > > > > > > >>> > > > > 2) Rack-locator class seems overly complex compared to > > > adding a > > > > >>> > > > rack.number > > > > >>> > > > > property to the broker properties file. Why do we want > > that? > > > > >>> > > > > > > > > >>> > > > > Gwen > > > > >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> > > > > On Mon, Sep 28, 2015 at 12:15 PM, Allen Wang < > > > > >>> allenxw...@gmail.com> > > > > >>> > > > wrote: > > > > >>> > > > > > > > > >>> > > > > > Hello Kafka Developers, > > > > >>> > > > > > > > > > >>> > > > > > I just created KIP-36 for rack aware replica > assignment. > > > > >>> > > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment > > > > >>> > > > > > > > > > >>> > > > > > The goal is to utilize the isolation provided by the > > racks > > > in > > > > >>> data > > > > >>> > > > center > > > > >>> > > > > > and distribute replicas to racks to provide fault > > > tolerance. > > > > >>> > > > > > > > > > >>> > > > > > Comments are welcome. > > > > >>> > > > > > > > > > >>> > > > > > Thanks, > > > > >>> > > > > > Allen > > > > >>> > > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >> > > > > >> > > > > > > > > > > > > > > >