yanwei created KAFKA-2550:
-----------------------------

             Summary: [Kafka][0.8.2.1][Performance]When there are a lot of 
partition under a Topic, there are serious performance degradation.
                 Key: KAFKA-2550
                 URL: https://issues.apache.org/jira/browse/KAFKA-2550
             Project: Kafka
          Issue Type: Bug
          Components: clients, consumer, producer 
    Affects Versions: 0.8.2.1
            Reporter: yanwei
            Assignee: Neha Narkhede


Because of business need to create a large number of partitions,I test the 
partition number of support.
But I find When there are a lot of partition under a Topic, there are serious 
performance degradation.
Through the analysis, in addition to the hard disk is bottleneck, the client is 
the bottleneck

I use JProfile,producer and consumer 1000000 message(msg size:500byte)
1、Consumer high level API:(I find i can't upload picture?)
     ZookeeperConsumerConnector.scala-->rebalance
-->val assignmentContext = new AssignmentContext(group, consumerIdString, 
config.excludeInternalTopics, zkClient)
-->ZkUtils.getPartitionsForTopics(zkClient, myTopicThreadIds.keySet.toSeq)
-->getPartitionAssignmentForTopics
-->Json.parseFull(jsonPartitionMap) 
     1) one topic 400 partion:
         JProfile:48.6% cpu run time
     2) ont topic 3000 partion:
         JProfile:97.8% cpu run time

  Maybe the file(jsonPartitionMap) is very big lead to parse is very slow.
  But this function is executed only once, so the problem should not be too big.

2、Producer Scala API:
    BrokerPartitionInfo.scala--->getBrokerPartitionInfo:
    partitionMetadata.map { m =>
      m.leader match {
        case Some(leader) =>
          //y00163442 delete log print
          debug("Partition [%s,%d] has leader %d".format(topic, m.partitionId, 
leader.id))
          new PartitionAndLeader(topic, m.partitionId, Some(leader.id))
        case None =>
          //y00163442 delete log print
          //debug("Partition [%s,%d] does not have a leader yet".format(topic, 
m.partitionId))
          new PartitionAndLeader(topic, m.partitionId, None)
      }
    }.sortWith((s, t) => s.partitionId < t.partitionId) 
         
      When partitions number>25,the function 'format' cpu run time is 44.8%.
      Nearly half of the time consumption in the format function.whether the 
log print open, this format will be executed.Led to the decrease of the TPS for 
five times(25000--->5000).
      
3、Producer JAVA client(clients module):
      function:org.apache.kafka.clients.producer.KafkaProducer.send
      I find the function 'send' cpu run time  rise with the rising number of 
partitions ,when partions is 5000,the cpu run time is 60.8.
      Because Kafka broker side of CPU, memory, disk, the network didn't reach 
the bottleneck , No matter request.required.acks is set to 0 or 1, the results 
are similar, I doubt the send there may be some bottlenecks.
      
Very unfortunately to upload pictures don't succeed, can't see the results.
My test results, for a single server, a single hard disk can support 1000 
partitions, 7 hard disk can support 3000 partitions.If can solve the bottleneck 
for the client, then seven hard disk I estimate that can support more 
partitions.

Actual production configuration, could be more partitions configuration under 
more than one TOPIC,Things could be better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to