If there are no more comments I would like to call for a vote.

On Sun, Nov 15, 2015 at 10:08 PM, Allen Wang <allenxw...@gmail.com> wrote:

> KIP is updated with more details and how to handle the situation where
> rack information is incomplete.
>
> In the situation where rack information is incomplete, but we want to
> continue with the assignment, I have suggested to ignore all rack
> information and fallback to original algorithm. The reason is explained
> below:
>
> The other options are to assume that the broker without the rack belong to
> its own unique rack, or they belong to one "default" rack. Either way we
> choose, it is highly likely to result in uneven number of brokers in racks,
> and it is quite possible that the "made up" racks will have much fewer
> number of brokers. As I explained in the KIP, uneven number of brokers in
> racks will lead to uneven distribution of replicas among brokers (even
> though the leader distribution is still even). The brokers in the rack that
> has fewer number of brokers will get more replicas per broker than brokers
> in other racks.
>
> Given this fact and the replica assignment produced will be incorrect
> anyway from rack aware point of view, ignoring all rack information and
> fallback to the original algorithm is not a bad choice since it will at
> least have a better guarantee of replica distribution.
>
> Also for command line tools it gives user a choice if for any reason they
> want to ignore rack information and fallback to the original algorithm.
>
>
> On Tue, Nov 10, 2015 at 9:04 AM, Allen Wang <allenxw...@gmail.com> wrote:
>
>> I am busy with some time pressing issues for the last few days. I will
>> think about how the incomplete rack information will affect the balance and
>> update the KIP by early next week.
>>
>> Thanks,
>> Allen
>>
>>
>> On Tue, Nov 3, 2015 at 9:03 AM, Neha Narkhede <n...@confluent.io> wrote:
>>
>>> Few suggestions on improving the KIP
>>>
>>> *If some brokers have rack, and some do not, the algorithm will thrown an
>>> > exception. This is to prevent incorrect assignment caused by user
>>> error.*
>>>
>>>
>>> In the KIP, can you clearly state the user-facing behavior when some
>>> brokers have rack information and some don't. Which actions and requests
>>> will error out and how?
>>>
>>> *Even distribution of partition leadership among brokers*
>>>
>>>
>>> There is some information about arranging the sorted broker list
>>> interlaced
>>> with rack ids. Can you describe the changes to the current algorithm in a
>>> little more detail? How does this interlacing work if only a subset of
>>> brokers have the rack id configured? Does this still work if uneven # of
>>> brokers are assigned to each rack? It might work, I'm looking for more
>>> details on the changes, since it will affect the behavior seen by the
>>> user
>>> - imbalance on either the leaders or data or both.
>>>
>>> On Mon, Nov 2, 2015 at 6:39 PM, Aditya Auradkar <aaurad...@linkedin.com>
>>> wrote:
>>>
>>> > I think this sounds reasonable. Anyone else have comments?
>>> >
>>> > Aditya
>>> >
>>> > On Tue, Oct 27, 2015 at 5:23 PM, Allen Wang <allenxw...@gmail.com>
>>> wrote:
>>> >
>>> > > During the discussion in the hangout, it was mentioned that it would
>>> be
>>> > > desirable that consumers know the rack information of the brokers so
>>> that
>>> > > they can consume from the broker in the same rack to reduce latency.
>>> As I
>>> > > understand this will only be beneficial if consumer can consume from
>>> any
>>> > > broker in ISR, which is not possible now.
>>> > >
>>> > > I suggest we skip the change to TMR. Once the change is made to
>>> consumer
>>> > to
>>> > > be able to consume from any broker in ISR, the rack information can
>>> be
>>> > > added to TMR.
>>> > >
>>> > > Another thing I want to confirm is  command line behavior. I think
>>> the
>>> > > desirable default behavior is to fail fast on command line for
>>> incomplete
>>> > > rack mapping. The error message can include further instruction that
>>> > tells
>>> > > the user to add an extra argument (like "--allow-partial-rackinfo")
>>> to
>>> > > suppress the error and do an imperfect rack aware assignment. If the
>>> > > default behavior is to allow incomplete mapping, the error can still
>>> be
>>> > > easily missed.
>>> > >
>>> > > The affected command line tools are TopicCommand and
>>> > > ReassignPartitionsCommand.
>>> > >
>>> > > Thanks,
>>> > > Allen
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > On Mon, Oct 26, 2015 at 12:55 PM, Aditya Auradkar <
>>> > aaurad...@linkedin.com>
>>> > > wrote:
>>> > >
>>> > > > Hi Allen,
>>> > > >
>>> > > > For TopicMetadataResponse to understand version, you can bump up
>>> the
>>> > > > request version itself. Based on the version of the request, the
>>> > response
>>> > > > can be appropriately serialized. It shouldn't be a huge change. For
>>> > > > example: We went through something similar for ProduceRequest
>>> recently
>>> > (
>>> > > > https://reviews.apache.org/r/33378/)
>>> > > > I guess the reason protocol information is not included in the TMR
>>> is
>>> > > > because the topic itself is independent of any particular protocol
>>> (SSL
>>> > > vs
>>> > > > Plaintext). Having said that, I'm not sure we even need rack
>>> > information
>>> > > in
>>> > > > TMR. What usecase were you thinking of initially?
>>> > > >
>>> > > > For 1 - I'd be fine with adding an option to the command line tools
>>> > that
>>> > > > check rack assignment. For e.g. "--strict-assignment" or something
>>> > > similar.
>>> > > >
>>> > > > Aditya
>>> > > >
>>> > > > On Thu, Oct 22, 2015 at 6:44 PM, Allen Wang <allenxw...@gmail.com>
>>> > > wrote:
>>> > > >
>>> > > > > For 2 and 3, I have updated the KIP. Please take a look. One
>>> thing I
>>> > > have
>>> > > > > changed is removing the proposal to add rack to
>>> > TopicMetadataResponse.
>>> > > > The
>>> > > > > reason is that unlike UpdateMetadataRequest,
>>> TopicMetadataResponse
>>> > does
>>> > > > not
>>> > > > > understand version. I don't see a way to include rack without
>>> > breaking
>>> > > > old
>>> > > > > version of clients. That's probably why secure protocol is not
>>> > included
>>> > > > in
>>> > > > > the TopicMetadataResponse either. I think it will be a much
>>> bigger
>>> > > change
>>> > > > > to include rack in TopicMetadataResponse.
>>> > > > >
>>> > > > > For 1, my concern is that doing rack aware assignment without
>>> > complete
>>> > > > > broker to rack mapping will result in assignment that is not rack
>>> > aware
>>> > > > and
>>> > > > > fail to provide fault tolerance in the event of rack outage. This
>>> > kind
>>> > > of
>>> > > > > problem will be difficult to surface. And the cost of this
>>> problem is
>>> > > > high:
>>> > > > > you have to do partition reassignment if you are lucky to spot
>>> the
>>> > > > problem
>>> > > > > early on or face the consequence of data loss during real rack
>>> > outage.
>>> > > > >
>>> > > > > I do see the concern of fail-fast as it might also cause data
>>> loss if
>>> > > > > producer is not able produce the message due to topic creation
>>> > failure.
>>> > > > Is
>>> > > > > it feasible to treat dynamic topic creation and command tools
>>> > > > differently?
>>> > > > > We allow dynamic topic creation with incomplete broker-rack
>>> mapping
>>> > and
>>> > > > > fail fast in command line. Another option is to let user
>>> determine
>>> > the
>>> > > > > behavior for command line. For example, by default fail fast in
>>> > command
>>> > > > > line but allow incomplete broker-rack mapping if another switch
>>> is
>>> > > > > provided.
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > On Tue, Oct 20, 2015 at 10:05 AM, Aditya Auradkar <
>>> > > > > aaurad...@linkedin.com.invalid> wrote:
>>> > > > >
>>> > > > > > Hey Allen,
>>> > > > > >
>>> > > > > > 1. If we choose fail fast topic creation, we will have topic
>>> > creation
>>> > > > > > failures while upgrading the cluster. I really doubt we want
>>> this
>>> > > > > behavior.
>>> > > > > > Ideally, this should be invisible to clients of a cluster.
>>> > Currently,
>>> > > > > each
>>> > > > > > broker is effectively its own rack. So we probably can use the
>>> rack
>>> > > > > > information whenever possible but not make it a hard
>>> requirement.
>>> > To
>>> > > > > extend
>>> > > > > > Gwen's example, one badly configured broker should not degrade
>>> > topic
>>> > > > > > creation for the entire cluster.
>>> > > > > >
>>> > > > > > 2. Upgrade scenario - Can you add a section on the upgrade
>>> piece to
>>> > > > > confirm
>>> > > > > > that old clients will not see errors? I believe
>>> > > > > ZookeeperConsumerConnector
>>> > > > > > reads the Broker objects from ZK. I wanted to confirm that this
>>> > will
>>> > > > not
>>> > > > > > cause any problems.
>>> > > > > >
>>> > > > > > 3. Could you elaborate your proposed changes to the
>>> > > > UpdateMetadataRequest
>>> > > > > > in the "Public Interfaces" section? Personally, I find this
>>> format
>>> > > easy
>>> > > > > to
>>> > > > > > read in terms of wire protocol changes:
>>> > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest
>>> > > > > >
>>> > > > > > Aditya
>>> > > > > >
>>> > > > > > On Fri, Oct 16, 2015 at 3:45 PM, Allen Wang <
>>> allenxw...@gmail.com>
>>> > > > > wrote:
>>> > > > > >
>>> > > > > > > KIP is updated include rack as an optional property for
>>> broker.
>>> > > > Please
>>> > > > > > take
>>> > > > > > > a look and let me know if more details are needed.
>>> > > > > > >
>>> > > > > > > For the case where some brokers have rack and some do not,
>>> the
>>> > > > current
>>> > > > > > KIP
>>> > > > > > > uses the fail-fast behavior. If there are concerns, we can
>>> > further
>>> > > > > > discuss
>>> > > > > > > this in the email thread or next hangout.
>>> > > > > > >
>>> > > > > > >
>>> > > > > > >
>>> > > > > > > On Thu, Oct 15, 2015 at 10:42 AM, Allen Wang <
>>> > allenxw...@gmail.com
>>> > > >
>>> > > > > > wrote:
>>> > > > > > >
>>> > > > > > > > That's a good question. I can think of three actions if the
>>> > rack
>>> > > > > > > > information is incomplete:
>>> > > > > > > >
>>> > > > > > > > 1. Treat the node without rack as if it is on its unique
>>> rack
>>> > > > > > > > 2. Disregard all rack information and fallback to current
>>> > > algorithm
>>> > > > > > > > 3. Fail-fast
>>> > > > > > > >
>>> > > > > > > > Now I think about it, one and three make more sense. The
>>> reason
>>> > > for
>>> > > > > > > > fail-fast is that user mistake for not providing the rack
>>> may
>>> > > never
>>> > > > > be
>>> > > > > > > > found if we tolerate that and the assignment may not be
>>> rack
>>> > > aware
>>> > > > as
>>> > > > > > the
>>> > > > > > > > user has expected and this creates debug problems when
>>> things
>>> > > fail.
>>> > > > > > > >
>>> > > > > > > > What do you think? If not fail-fast, is there anyway we can
>>> > make
>>> > > > the
>>> > > > > > user
>>> > > > > > > > error standing out?
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > > On Thu, Oct 15, 2015 at 10:17 AM, Gwen Shapira <
>>> > > g...@confluent.io>
>>> > > > > > > wrote:
>>> > > > > > > >
>>> > > > > > > >> Thanks! Just to clarify, when some brokers have rack
>>> > assignment
>>> > > > and
>>> > > > > > some
>>> > > > > > > >> don't, do we act like none of them have it? or like those
>>> > > without
>>> > > > > > > >> assignment are in their own rack?
>>> > > > > > > >>
>>> > > > > > > >> The first scenario is good when first setting up
>>> > rack-awareness,
>>> > > > but
>>> > > > > > the
>>> > > > > > > >> second makes more sense for on-going maintenance (I can
>>> > totally
>>> > > > see
>>> > > > > > > >> someone
>>> > > > > > > >> adding a node and forgetting to set the rack property, we
>>> > don't
>>> > > > want
>>> > > > > > > this
>>> > > > > > > >> to change behavior for anything except the new node).
>>> > > > > > > >>
>>> > > > > > > >> What do you think?
>>> > > > > > > >>
>>> > > > > > > >> Gwen
>>> > > > > > > >>
>>> > > > > > > >> On Thu, Oct 15, 2015 at 10:13 AM, Allen Wang <
>>> > > > allenxw...@gmail.com>
>>> > > > > > > >> wrote:
>>> > > > > > > >>
>>> > > > > > > >> > For scenario 1:
>>> > > > > > > >> >
>>> > > > > > > >> > - Add the rack information to broker property file or
>>> > > > dynamically
>>> > > > > > set
>>> > > > > > > >> it in
>>> > > > > > > >> > the wrapper code to bootstrap Kafka server. You would do
>>> > that
>>> > > > for
>>> > > > > > all
>>> > > > > > > >> > brokers and restart the brokers one by one.
>>> > > > > > > >> >
>>> > > > > > > >> > In this scenario, the complete broker to rack mapping
>>> may
>>> > not
>>> > > be
>>> > > > > > > >> available
>>> > > > > > > >> > until every broker is restarted. During that time we
>>> fall
>>> > back
>>> > > > to
>>> > > > > > > >> default
>>> > > > > > > >> > replica assignment algorithm.
>>> > > > > > > >> >
>>> > > > > > > >> > For scenario 2:
>>> > > > > > > >> >
>>> > > > > > > >> > - Add the rack information to broker property file or
>>> > > > dynamically
>>> > > > > > set
>>> > > > > > > >> it in
>>> > > > > > > >> > the wrapper code and start the broker.
>>> > > > > > > >> >
>>> > > > > > > >> >
>>> > > > > > > >> > On Wed, Oct 14, 2015 at 2:36 PM, Gwen Shapira <
>>> > > > g...@confluent.io>
>>> > > > > > > >> wrote:
>>> > > > > > > >> >
>>> > > > > > > >> > > Can you clarify the workflow for the following
>>> scenarios:
>>> > > > > > > >> > >
>>> > > > > > > >> > > 1. I currently have 6 brokers and want to add rack
>>> > > information
>>> > > > > for
>>> > > > > > > >> each
>>> > > > > > > >> > > 2. I'm adding a new broker and I want to specify which
>>> > rack
>>> > > it
>>> > > > > > > >> belongs on
>>> > > > > > > >> > > while adding it.
>>> > > > > > > >> > >
>>> > > > > > > >> > > Thanks!
>>> > > > > > > >> > >
>>> > > > > > > >> > > On Tue, Oct 13, 2015 at 2:21 PM, Allen Wang <
>>> > > > > allenxw...@gmail.com
>>> > > > > > >
>>> > > > > > > >> > wrote:
>>> > > > > > > >> > >
>>> > > > > > > >> > > > We discussed the KIP in the hangout today. The
>>> > > > recommendation
>>> > > > > is
>>> > > > > > > to
>>> > > > > > > >> > make
>>> > > > > > > >> > > > rack as a broker property in ZooKeeper. For users
>>> with
>>> > > > > existing
>>> > > > > > > rack
>>> > > > > > > >> > > > information stored somewhere, they would need to
>>> > retrieve
>>> > > > the
>>> > > > > > > >> > information
>>> > > > > > > >> > > > at broker start up and dynamically set the rack
>>> > property,
>>> > > > > which
>>> > > > > > > can
>>> > > > > > > >> be
>>> > > > > > > >> > > > implemented as a wrapper to bootstrap broker. There
>>> will
>>> > > be
>>> > > > no
>>> > > > > > > >> > interface
>>> > > > > > > >> > > or
>>> > > > > > > >> > > > pluggable implementation to retrieve the rack
>>> > information.
>>> > > > > > > >> > > >
>>> > > > > > > >> > > > The assumption is that you always need to restart
>>> the
>>> > > broker
>>> > > > > to
>>> > > > > > > >> make a
>>> > > > > > > >> > > > change to the rack.
>>> > > > > > > >> > > >
>>> > > > > > > >> > > > Once the rack becomes a broker property, it will be
>>> > > possible
>>> > > > > to
>>> > > > > > > make
>>> > > > > > > >> > rack
>>> > > > > > > >> > > > part of the meta data to help the consumer choose
>>> which
>>> > in
>>> > > > > sync
>>> > > > > > > >> replica
>>> > > > > > > >> > > to
>>> > > > > > > >> > > > consume from as part of the future consumer
>>> enhancement.
>>> > > > > > > >> > > >
>>> > > > > > > >> > > > I will update the KIP.
>>> > > > > > > >> > > >
>>> > > > > > > >> > > > Thanks,
>>> > > > > > > >> > > > Allen
>>> > > > > > > >> > > >
>>> > > > > > > >> > > >
>>> > > > > > > >> > > > On Thu, Oct 8, 2015 at 9:23 AM, Allen Wang <
>>> > > > > > allenxw...@gmail.com>
>>> > > > > > > >> > wrote:
>>> > > > > > > >> > > >
>>> > > > > > > >> > > > > I attended Tuesday's KIP hangout but this KIP was
>>> not
>>> > > > > > discussed
>>> > > > > > > >> due
>>> > > > > > > >> > to
>>> > > > > > > >> > > > > time constraint.
>>> > > > > > > >> > > > >
>>> > > > > > > >> > > > > However, after hearing discussion of KIP-35, I
>>> have
>>> > the
>>> > > > > > feeling
>>> > > > > > > >> that
>>> > > > > > > >> > > > > incompatibility (caused by new broker property)
>>> > between
>>> > > > > > brokers
>>> > > > > > > >> with
>>> > > > > > > >> > > > > different versions  will be solved there. In
>>> addition,
>>> > > > > having
>>> > > > > > > >> stack
>>> > > > > > > >> > in
>>> > > > > > > >> > > > > broker property as meta data may also help
>>> consumers
>>> > in
>>> > > > the
>>> > > > > > > >> future.
>>> > > > > > > >> > So
>>> > > > > > > >> > > I
>>> > > > > > > >> > > > am
>>> > > > > > > >> > > > > open to adding stack property to broker.
>>> > > > > > > >> > > > >
>>> > > > > > > >> > > > > Hopefully we can discuss this in the next KIP
>>> hangout.
>>> > > > > > > >> > > > >
>>> > > > > > > >> > > > > On Wed, Sep 30, 2015 at 2:46 PM, Allen Wang <
>>> > > > > > > allenxw...@gmail.com
>>> > > > > > > >> >
>>> > > > > > > >> > > > wrote:
>>> > > > > > > >> > > > >
>>> > > > > > > >> > > > >> Can you send me the information on the next KIP
>>> > > hangout?
>>> > > > > > > >> > > > >>
>>> > > > > > > >> > > > >> Currently the broker-rack mapping is not cached.
>>> In
>>> > > > > > KafkaApis,
>>> > > > > > > >> > > > >> RackLocator.getRackInfo() is called each time the
>>> > > mapping
>>> > > > > is
>>> > > > > > > >> needed
>>> > > > > > > >> > > for
>>> > > > > > > >> > > > >> auto topic creation. This will ensure latest
>>> mapping
>>> > is
>>> > > > > used
>>> > > > > > at
>>> > > > > > > >> any
>>> > > > > > > >> > > > time.
>>> > > > > > > >> > > > >>
>>> > > > > > > >> > > > >> The ability to get the complete mapping makes it
>>> > simple
>>> > > > to
>>> > > > > > > reuse
>>> > > > > > > >> the
>>> > > > > > > >> > > > same
>>> > > > > > > >> > > > >> interface in command line tools.
>>> > > > > > > >> > > > >>
>>> > > > > > > >> > > > >>
>>> > > > > > > >> > > > >> On Wed, Sep 30, 2015 at 11:01 AM, Aditya
>>> Auradkar <
>>> > > > > > > >> > > > >> aaurad...@linkedin.com.invalid> wrote:
>>> > > > > > > >> > > > >>
>>> > > > > > > >> > > > >>> Perhaps we discuss this during the next KIP
>>> hangout?
>>> > > > > > > >> > > > >>>
>>> > > > > > > >> > > > >>> I do see that a pluggable rack locator can be
>>> useful
>>> > > > but I
>>> > > > > > do
>>> > > > > > > >> see a
>>> > > > > > > >> > > few
>>> > > > > > > >> > > > >>> concerns:
>>> > > > > > > >> > > > >>>
>>> > > > > > > >> > > > >>> - The RackLocator (as described in the
>>> document),
>>> > > > implies
>>> > > > > > that
>>> > > > > > > >> it
>>> > > > > > > >> > can
>>> > > > > > > >> > > > >>> discover rack information for any node in the
>>> > cluster.
>>> > > > How
>>> > > > > > > does
>>> > > > > > > >> it
>>> > > > > > > >> > > deal
>>> > > > > > > >> > > > >>> with rack location changes? For example, if I
>>> moved
>>> > > > broker
>>> > > > > > id
>>> > > > > > > >> (1)
>>> > > > > > > >> > > from
>>> > > > > > > >> > > > >>> rack
>>> > > > > > > >> > > > >>> X to Y, I only have to start that broker with a
>>> > newer
>>> > > > rack
>>> > > > > > > >> config.
>>> > > > > > > >> > If
>>> > > > > > > >> > > > >>> RackLocator discovers broker -> rack
>>> information at
>>> > > > start
>>> > > > > up
>>> > > > > > > >> time,
>>> > > > > > > >> > > any
>>> > > > > > > >> > > > >>> change to a broker will require bouncing the
>>> entire
>>> > > > > cluster
>>> > > > > > > >> since
>>> > > > > > > >> > > > >>> createTopic requests can be sent to any node in
>>> the
>>> > > > > cluster.
>>> > > > > > > >> > > > >>> For this reason it may be simpler to have each
>>> node
>>> > be
>>> > > > > aware
>>> > > > > > > of
>>> > > > > > > >> its
>>> > > > > > > >> > > own
>>> > > > > > > >> > > > >>> rack and persist it in ZK during start up time.
>>> > > > > > > >> > > > >>>
>>> > > > > > > >> > > > >>> - A pluggable RackLocator relies on an external
>>> > > service
>>> > > > > > being
>>> > > > > > > >> > > available
>>> > > > > > > >> > > > >>> to
>>> > > > > > > >> > > > >>> serve rack information.
>>> > > > > > > >> > > > >>>
>>> > > > > > > >> > > > >>> Out of curiosity, I looked up how a couple of
>>> other
>>> > > > > systems
>>> > > > > > > deal
>>> > > > > > > >> > with
>>> > > > > > > >> > > > >>> zone/rack awareness.
>>> > > > > > > >> > > > >>> For Cassandra some interesting modes are:
>>> > > > > > > >> > > > >>> (Property File configuration)
>>> > > > > > > >> > > > >>>
>>> > > > > > > >> > > > >>>
>>> > > > > > > >> > > >
>>> > > > > > > >> > >
>>> > > > > > > >> >
>>> > > > > > > >>
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
>>> > > > > > > >> > > > >>> (Dynamic inference)
>>> > > > > > > >> > > > >>>
>>> > > > > > > >> > > > >>>
>>> > > > > > > >> > > >
>>> > > > > > > >> > >
>>> > > > > > > >> >
>>> > > > > > > >>
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
>>> > > > > > > >> > > > >>>
>>> > > > > > > >> > > > >>> Voldemort does a static node -> zone assignment
>>> > based
>>> > > on
>>> > > > > > > >> > > configuration.
>>> > > > > > > >> > > > >>>
>>> > > > > > > >> > > > >>> Aditya
>>> > > > > > > >> > > > >>>
>>> > > > > > > >> > > > >>> On Wed, Sep 30, 2015 at 10:05 AM, Allen Wang <
>>> > > > > > > >> allenxw...@gmail.com
>>> > > > > > > >> > >
>>> > > > > > > >> > > > >>> wrote:
>>> > > > > > > >> > > > >>>
>>> > > > > > > >> > > > >>> > I would like to see if we can do both:
>>> > > > > > > >> > > > >>> >
>>> > > > > > > >> > > > >>> > - Make RackLocator pluggable to facilitate
>>> > migration
>>> > > > > with
>>> > > > > > > >> > existing
>>> > > > > > > >> > > > >>> > broker-rack mapping
>>> > > > > > > >> > > > >>> >
>>> > > > > > > >> > > > >>> > - Make rack an optional property for broker.
>>> If
>>> > rack
>>> > > > is
>>> > > > > > > >> available
>>> > > > > > > >> > > > from
>>> > > > > > > >> > > > >>> > broker, treat it as source of truth. For users
>>> > with
>>> > > > > > existing
>>> > > > > > > >> > > > >>> broker-rack
>>> > > > > > > >> > > > >>> > mapping somewhere else, they can use the
>>> pluggable
>>> > > way
>>> > > > > or
>>> > > > > > > they
>>> > > > > > > >> > can
>>> > > > > > > >> > > > >>> transfer
>>> > > > > > > >> > > > >>> > the mapping to the broker rack property.
>>> > > > > > > >> > > > >>> >
>>> > > > > > > >> > > > >>> > One thing I am not sure is what happens at
>>> rolling
>>> > > > > upgrade
>>> > > > > > > >> when
>>> > > > > > > >> > we
>>> > > > > > > >> > > > have
>>> > > > > > > >> > > > >>> > rack as a broker property. For brokers with
>>> older
>>> > > > > version
>>> > > > > > of
>>> > > > > > > >> > Kafka,
>>> > > > > > > >> > > > >>> will it
>>> > > > > > > >> > > > >>> > cause problem for them? If so, is there any
>>> > > > workaround?
>>> > > > > I
>>> > > > > > > also
>>> > > > > > > >> > > think
>>> > > > > > > >> > > > it
>>> > > > > > > >> > > > >>> > would be better not to have rack in the
>>> controller
>>> > > > wire
>>> > > > > > > >> protocol
>>> > > > > > > >> > > but
>>> > > > > > > >> > > > >>> not
>>> > > > > > > >> > > > >>> > sure if it is achievable.
>>> > > > > > > >> > > > >>> >
>>> > > > > > > >> > > > >>> > Thanks,
>>> > > > > > > >> > > > >>> > Allen
>>> > > > > > > >> > > > >>> >
>>> > > > > > > >> > > > >>> >
>>> > > > > > > >> > > > >>> >
>>> > > > > > > >> > > > >>> >
>>> > > > > > > >> > > > >>> >
>>> > > > > > > >> > > > >>> > On Mon, Sep 28, 2015 at 4:55 PM, Todd Palino <
>>> > > > > > > >> tpal...@gmail.com>
>>> > > > > > > >> > > > >>> wrote:
>>> > > > > > > >> > > > >>> >
>>> > > > > > > >> > > > >>> > > I tend to like the idea of a pluggable
>>> locator.
>>> > > For
>>> > > > > > > >> example, we
>>> > > > > > > >> > > > >>> already
>>> > > > > > > >> > > > >>> > > have an interface for discovering
>>> information
>>> > > about
>>> > > > > the
>>> > > > > > > >> > physical
>>> > > > > > > >> > > > >>> location
>>> > > > > > > >> > > > >>> > > of servers. I don't relish the idea of
>>> having to
>>> > > > > > maintain
>>> > > > > > > >> data
>>> > > > > > > >> > in
>>> > > > > > > >> > > > >>> > multiple
>>> > > > > > > >> > > > >>> > > places.
>>> > > > > > > >> > > > >>> > >
>>> > > > > > > >> > > > >>> > > -Todd
>>> > > > > > > >> > > > >>> > >
>>> > > > > > > >> > > > >>> > > On Mon, Sep 28, 2015 at 4:48 PM, Aditya
>>> > Auradkar <
>>> > > > > > > >> > > > >>> > > aaurad...@linkedin.com.invalid> wrote:
>>> > > > > > > >> > > > >>> > >
>>> > > > > > > >> > > > >>> > > > Thanks for starting this KIP Allen.
>>> > > > > > > >> > > > >>> > > >
>>> > > > > > > >> > > > >>> > > > I agree with Gwen that having a
>>> RackLocator
>>> > > class
>>> > > > > that
>>> > > > > > > is
>>> > > > > > > >> > > > pluggable
>>> > > > > > > >> > > > >>> > seems
>>> > > > > > > >> > > > >>> > > > to be too complex. The KIP refers to
>>> > potentially
>>> > > > > > non-ZK
>>> > > > > > > >> > storage
>>> > > > > > > >> > > > >>> for the
>>> > > > > > > >> > > > >>> > > > rack info which I don't think is
>>> necessary.
>>> > > > > > > >> > > > >>> > > >
>>> > > > > > > >> > > > >>> > > > Perhaps we can persist this info in zk
>>> under
>>> > > > > > > >> > > > >>> /brokers/ids/<broker_id>
>>> > > > > > > >> > > > >>> > > > similar to other broker properties and
>>> add a
>>> > > > config
>>> > > > > in
>>> > > > > > > >> > > > KafkaConfig
>>> > > > > > > >> > > > >>> > called
>>> > > > > > > >> > > > >>> > > > "rack".
>>> > > > > > > >> > > > >>> > > >
>>> > > > > > > {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
>>> > > > > > > >> > > "rack":
>>> > > > > > > >> > > > >>> > "abc"}
>>> > > > > > > >> > > > >>> > > >
>>> > > > > > > >> > > > >>> > > > Aditya
>>> > > > > > > >> > > > >>> > > >
>>> > > > > > > >> > > > >>> > > > On Mon, Sep 28, 2015 at 2:30 PM, Gwen
>>> Shapira
>>> > <
>>> > > > > > > >> > > g...@confluent.io
>>> > > > > > > >> > > > >
>>> > > > > > > >> > > > >>> > wrote:
>>> > > > > > > >> > > > >>> > > >
>>> > > > > > > >> > > > >>> > > > > Hi,
>>> > > > > > > >> > > > >>> > > > >
>>> > > > > > > >> > > > >>> > > > > First, thanks for putting out a KIP for
>>> > this.
>>> > > > This
>>> > > > > > is
>>> > > > > > > >> super
>>> > > > > > > >> > > > >>> important
>>> > > > > > > >> > > > >>> > > for
>>> > > > > > > >> > > > >>> > > > > production deployments of Kafka.
>>> > > > > > > >> > > > >>> > > > >
>>> > > > > > > >> > > > >>> > > > > Few questions:
>>> > > > > > > >> > > > >>> > > > >
>>> > > > > > > >> > > > >>> > > > > 1) Are we sure we want "as many racks as
>>> > > > > possible"?
>>> > > > > > > I'd
>>> > > > > > > >> > want
>>> > > > > > > >> > > to
>>> > > > > > > >> > > > >>> > balance
>>> > > > > > > >> > > > >>> > > > > between safety (more racks) and network
>>> > > > > utilization
>>> > > > > > > >> > (traffic
>>> > > > > > > >> > > > >>> within a
>>> > > > > > > >> > > > >>> > > > rack
>>> > > > > > > >> > > > >>> > > > > uses the high-bandwidth TOR switch). One
>>> > > replica
>>> > > > > on
>>> > > > > > a
>>> > > > > > > >> > > different
>>> > > > > > > >> > > > >>> rack
>>> > > > > > > >> > > > >>> > > and
>>> > > > > > > >> > > > >>> > > > > the rest on same rack (if possible)
>>> sounds
>>> > > > better
>>> > > > > to
>>> > > > > > > me.
>>> > > > > > > >> > > > >>> > > > >
>>> > > > > > > >> > > > >>> > > > > 2) Rack-locator class seems overly
>>> complex
>>> > > > > compared
>>> > > > > > to
>>> > > > > > > >> > > adding a
>>> > > > > > > >> > > > >>> > > > rack.number
>>> > > > > > > >> > > > >>> > > > > property to the broker properties file.
>>> Why
>>> > do
>>> > > > we
>>> > > > > > want
>>> > > > > > > >> > that?
>>> > > > > > > >> > > > >>> > > > >
>>> > > > > > > >> > > > >>> > > > > Gwen
>>> > > > > > > >> > > > >>> > > > >
>>> > > > > > > >> > > > >>> > > > >
>>> > > > > > > >> > > > >>> > > > >
>>> > > > > > > >> > > > >>> > > > > On Mon, Sep 28, 2015 at 12:15 PM, Allen
>>> > Wang <
>>> > > > > > > >> > > > >>> allenxw...@gmail.com>
>>> > > > > > > >> > > > >>> > > > wrote:
>>> > > > > > > >> > > > >>> > > > >
>>> > > > > > > >> > > > >>> > > > > > Hello Kafka Developers,
>>> > > > > > > >> > > > >>> > > > > >
>>> > > > > > > >> > > > >>> > > > > > I just created KIP-36 for rack aware
>>> > replica
>>> > > > > > > >> assignment.
>>> > > > > > > >> > > > >>> > > > > >
>>> > > > > > > >> > > > >>> > > > > >
>>> > > > > > > >> > > > >>> > > > > >
>>> > > > > > > >> > > > >>> > > > >
>>> > > > > > > >> > > > >>> > > >
>>> > > > > > > >> > > > >>> > >
>>> > > > > > > >> > > > >>> >
>>> > > > > > > >> > > > >>>
>>> > > > > > > >> > > >
>>> > > > > > > >> > >
>>> > > > > > > >> >
>>> > > > > > > >>
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
>>> > > > > > > >> > > > >>> > > > > >
>>> > > > > > > >> > > > >>> > > > > > The goal is to utilize the isolation
>>> > > provided
>>> > > > by
>>> > > > > > the
>>> > > > > > > >> > racks
>>> > > > > > > >> > > in
>>> > > > > > > >> > > > >>> data
>>> > > > > > > >> > > > >>> > > > center
>>> > > > > > > >> > > > >>> > > > > > and distribute replicas to racks to
>>> > provide
>>> > > > > fault
>>> > > > > > > >> > > tolerance.
>>> > > > > > > >> > > > >>> > > > > >
>>> > > > > > > >> > > > >>> > > > > > Comments are welcome.
>>> > > > > > > >> > > > >>> > > > > >
>>> > > > > > > >> > > > >>> > > > > > Thanks,
>>> > > > > > > >> > > > >>> > > > > > Allen
>>> > > > > > > >> > > > >>> > > > > >
>>> > > > > > > >> > > > >>> > > > >
>>> > > > > > > >> > > > >>> > > >
>>> > > > > > > >> > > > >>> > >
>>> > > > > > > >> > > > >>> >
>>> > > > > > > >> > > > >>>
>>> > > > > > > >> > > > >>
>>> > > > > > > >> > > > >>
>>> > > > > > > >> > > > >
>>> > > > > > > >> > > >
>>> > > > > > > >> > >
>>> > > > > > > >> >
>>> > > > > > > >>
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>>
>>>
>>> --
>>> Thanks,
>>> Neha
>>>
>>
>>
>

Reply via email to