Lucas Bradstreet created KAFKA-7410:
---------------------------------------

             Summary: Rack aware partitions assignment create unbalanced broker 
assignments on unbalanced racks
                 Key: KAFKA-7410
                 URL: https://issues.apache.org/jira/browse/KAFKA-7410
             Project: Kafka
          Issue Type: Bug
          Components: admin
    Affects Versions: 1.1.1
            Reporter: Lucas Bradstreet
         Attachments: AdminUtilsTest.scala

AdminUtils creates a bad partition assignment when the number of brokers on 
each rack is unbalanced, e.g. 80 brokers rack A, 20 brokers rack B, 15 brokers 
rack C. Under such a scenario, a single broker from rack C may be assigned over 
and over again, when more balanced allocations exist.

kafka.admin.AdminUtils.getRackAlternatedBrokerList is supposed to create a list 
of brokers alternating by rack, however once it runs out of brokers on the 
racks with fewer brokers, it ends up placing a run of brokers from the same 
rack together as rackIterator.hasNext will return false for the other racks.
{code:java}
while (result.size < brokerRackMap.size) {
  val rackIterator = brokersIteratorByRack(racks(rackIndex))
  if (rackIterator.hasNext)
    result += rackIterator.next()
  rackIndex = (rackIndex + 1) % racks.length
}{code}
Once assignReplicasToBrokersRackAware hits the run of brokers from the same 
rack, when choosing the replicas to go along with the leader on the rack with 
the most brokers e.g. C, it will skip all of the C brokers until it wraps 
around to the first broker in the alternated list, and choose the first broker 
in the alternated list.

 
{code:java}
if ((!racksWithReplicas.contains(rack) || racksWithReplicas.size == numRacks)
&& (!brokersWithReplicas.contains(broker) || brokersWithReplicas.size == 
numBrokers)) {
replicaBuffer += broker
racksWithReplicas += rack
brokersWithReplicas += broker
done = true
}
k += 1
{code}
It does so for each of the remaining brokers for C, choosing the first broker 
in the alternated list until it's allocated all of the partitions.

See the attached sample code for more details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to