Optimal number of partitions for topic

Carles Sistare Wed, 20 May 2015 07:27:21 -0700

Hi,
We are implementing a Kafka cluster with 9 brokers into EC2 instances, and we 
are trying to find out the optimal number of partitions for our topics, finding 
out the maximal number in order not to update the partition number anymore.
What we understood is that the number of partitions shouldn’t affect the CPU 
load of the brokers, but when we add 512 partitions instead of 128, for 
instance, the CPU load exploses.
We have three topics with 100000 messages/sec each, a replication factor of 3 
and two consumer groups for each partition.


Could somebody explain, why the increase of the number of partitions has a so 
dramatic impact to the CPU load?


Here under i paste the config file of kafka:

broker.id=3

default.replication.factor=3


# The port the socket server listens on
port=9092

# The number of threads handling network requests
num.network.threads=2
 
# The number of threads doing disk I/O
num.io.threads=8

# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=1048576

# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=1048576

# The maximum size of a request that the socket server will accept (protection 
against OOM)
socket.request.max.bytes=104857600



# A comma seperated list of directories under which to store log files
log.dirs=/mnt/kafka-logs

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=16

# The minimum age of a log file to be eligible for deletion
log.retention.hours=1

# The maximum size of a log segment file. When this size is reached a new log 
segment will be created.
log.segment.bytes=536870912

# The interval at which log segments are checked to see if they can be deleted 
according 
# to the retention policies
log.retention.check.interval.ms=60000

# By default the log cleaner is disabled and the log retention policy will 
default to just delete segments after their retention expires.
# If log.cleaner.enable=true is set the cleaner will be enabled and individual 
logs can then be marked for log compaction.
log.cleaner.enable=false

# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=1000000

auto.leader.rebalance.enable=true
controlled.shutdown.enable=true


Thanks in advance.



Carles Sistare

Optimal number of partitions for topic

Reply via email to