[ 
https://issues.apache.org/jira/browse/KAFKA-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598615#comment-14598615
 ] 

Allen Wang commented on KAFKA-1215:
-----------------------------------

We have a working solution now for rack aware assignment. It is based on 
current patch for this JIRA but with some improvement. The key idea of the 
solution is:

- Rack ID is a String instead of integer
- For replica assignment, add an extra parameter of Map[Int, String] to 
assignReplicasToBrokers() method which maps broker ID to rack ID
- Before doing the rack aware assignment, sort the broker list such that they 
are interlaced according to the rack. In other words, adjacent brokers should 
not be in the same rack if possible . For example, assuming 6 brokers mapping 
to 3 racks:

0 -> "rack1", 1 -> "rack1", 2 -> "rack2", 3 -> "rack2", 4 -> "rack3", 5 -> 
"rack3"

The sorted broker list could be (0, 2, 4, 1, 3, 5)

- Apply the same assignment algorithm to assign replicas, with the addition of 
skipping a broker if its rack is already used for the same partition (similar 
to what has been done in current patch)

The benefit of this approach is that replica distribution is kept as even as 
possible to all the racks and brokers.

With regard to KAFKA-1792, an easy solution is to restrict replica movement 
within the same rack, which I think should work in most practical cases. It 
will also have added benefit that usually replicas move faster within a rack. 
So basically we can apply the same algorithm described in KAFKA-1792 for each 
rack. For example, if there are three racks, then apply the algorithm three 
times, each time with broker list and assignment for that specific rack. Again, 
we assume the broker to rack mapping will be available in the method signature.

The open question is how to obtain broker to rack mapping. The information can 
be supplied when Kafka registers the broker with ZooKeeper which means some 
information has to be added to ZooKeeper. However, it could be that the rack 
information is already available in a deployment independent way. For example, 
for some deployment, the rack information may be available in a database. What 
we can do is to abstract out the API required to obtain rack information in an 
interface and allow user to supply an implementation in command line or at 
broker start up (to handle auto topic creation).





 

> Rack-Aware replica assignment option
> ------------------------------------
>
>                 Key: KAFKA-1215
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1215
>             Project: Kafka
>          Issue Type: Improvement
>          Components: replication
>    Affects Versions: 0.8.0
>            Reporter: Joris Van Remoortere
>            Assignee: Jun Rao
>             Fix For: 0.9.0
>
>         Attachments: rack_aware_replica_assignment_v1.patch, 
> rack_aware_replica_assignment_v2.patch
>
>
> Adding a rack-id to kafka config. This rack-id can be used during replica 
> assignment by using the max-rack-replication argument in the admin scripts 
> (create topic, etc.). By default the original replication assignment 
> algorithm is used because max-rack-replication defaults to -1. 
> max-rack-replication > -1 is not honored if you are doing manual replica 
> assignment (preffered).
> If this looks good I can add some test cases specific to the rack-aware 
> assignment.
> I can also port this to trunk. We are currently running 0.8.0 in production 
> and need this, so i wrote the patch against that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to