With knowing the actual implementation details, I would get guess more partitions implies more parallelism, more concurrency, more threads, more files to write to - all of which will contribute to more CPU load.
Partitions allow you to scale by partitioning the topic across multiple brokers. Partition is also a unit of replication ( 1 leader + replicas ). And for consumption of messages, the order is maintained within a partitions. But if you put 100 partitions per topic on 1 single broker, I wonder if it is going to be an overhead. On Wed, May 20, 2015 at 1:02 AM, Carles Sistare <car...@ogury.co> wrote: > Hi, > We are implementing a Kafka cluster with 9 brokers into EC2 instances, and > we are trying to find out the optimal number of partitions for our topics, > finding out the maximal number in order not to update the partition number > anymore. > What we understood is that the number of partitions shouldn’t affect the > CPU load of the brokers, but when we add 512 partitions instead of 128, for > instance, the CPU load exploses. > We have three topics with 100000 messages/sec each, a replication factor > of 3 and two consumer groups for each partition. > > Could somebody explain, why the increase of the number of partitions has a > so dramatic impact to the CPU load? > > > Here under i paste the config file of kafka: > > broker.id=3 > > default.replication.factor=3 > > > # The port the socket server listens on > port=9092 > > # The number of threads handling network requests > num.network.threads=2 > > # The number of threads doing disk I/O > num.io.threads=8 > > # The send buffer (SO_SNDBUF) used by the socket server > socket.send.buffer.bytes=1048576 > > # The receive buffer (SO_RCVBUF) used by the socket server > socket.receive.buffer.bytes=1048576 > > # The maximum size of a request that the socket server will accept > (protection against OOM) > socket.request.max.bytes=104857600 > > > > # A comma seperated list of directories under which to store log files > log.dirs=/mnt/kafka-logs > > # The default number of log partitions per topic. More partitions allow > greater > # parallelism for consumption, but this will also result in more files > across > # the brokers. > num.partitions=16 > > # The minimum age of a log file to be eligible for deletion > log.retention.hours=1 > > # The maximum size of a log segment file. When this size is reached a new > log segment will be created. > log.segment.bytes=536870912 > > # The interval at which log segments are checked to see if they can be > deleted according > # to the retention policies > log.retention.check.interval.ms=60000 > > # By default the log cleaner is disabled and the log retention policy will > default to just delete segments after their retention expires. > # If log.cleaner.enable=true is set the cleaner will be enabled and > individual logs can then be marked for log compaction. > log.cleaner.enable=false > > # Timeout in ms for connecting to zookeeper > zookeeper.connection.timeout.ms=1000000 > > auto.leader.rebalance.enable=true > controlled.shutdown.enable=true > > > Thanks in advance. > > > > Carles Sistare > > > -- http://khangaonkar.blogspot.com/