With knowing the actual implementation details, I would get guess more
partitions implies more parallelism, more concurrency, more threads, more
files to write to - all of which will contribute to more CPU load.

Partitions allow you to scale by partitioning the topic across multiple
brokers. Partition is also a unit of replication ( 1 leader + replicas ).
And for consumption of messages, the order is maintained within a
partitions.

But if you put 100 partitions per topic on 1 single broker, I wonder if it
is going to be an overhead.



On Wed, May 20, 2015 at 1:02 AM, Carles Sistare <car...@ogury.co> wrote:

> Hi,
> We are implementing a Kafka cluster with 9 brokers into EC2 instances, and
> we are trying to find out the optimal number of partitions for our topics,
> finding out the maximal number in order not to update the partition number
> anymore.
> What we understood is that the number of partitions shouldn’t affect the
> CPU load of the brokers, but when we add 512 partitions instead of 128, for
> instance, the CPU load exploses.
> We have three topics with 100000 messages/sec each, a replication factor
> of 3 and two consumer groups for each partition.
>
> Could somebody explain, why the increase of the number of partitions has a
> so dramatic impact to the CPU load?
>
>
> Here under i paste the config file of kafka:
>
> broker.id=3
>
> default.replication.factor=3
>
>
> # The port the socket server listens on
> port=9092
>
> # The number of threads handling network requests
> num.network.threads=2
>
> # The number of threads doing disk I/O
> num.io.threads=8
>
> # The send buffer (SO_SNDBUF) used by the socket server
> socket.send.buffer.bytes=1048576
>
> # The receive buffer (SO_RCVBUF) used by the socket server
> socket.receive.buffer.bytes=1048576
>
> # The maximum size of a request that the socket server will accept
> (protection against OOM)
> socket.request.max.bytes=104857600
>
>
>
> # A comma seperated list of directories under which to store log files
> log.dirs=/mnt/kafka-logs
>
> # The default number of log partitions per topic. More partitions allow
> greater
> # parallelism for consumption, but this will also result in more files
> across
> # the brokers.
> num.partitions=16
>
> # The minimum age of a log file to be eligible for deletion
> log.retention.hours=1
>
> # The maximum size of a log segment file. When this size is reached a new
> log segment will be created.
> log.segment.bytes=536870912
>
> # The interval at which log segments are checked to see if they can be
> deleted according
> # to the retention policies
> log.retention.check.interval.ms=60000
>
> # By default the log cleaner is disabled and the log retention policy will
> default to just delete segments after their retention expires.
> # If log.cleaner.enable=true is set the cleaner will be enabled and
> individual logs can then be marked for log compaction.
> log.cleaner.enable=false
>
> # Timeout in ms for connecting to zookeeper
> zookeeper.connection.timeout.ms=1000000
>
> auto.leader.rebalance.enable=true
> controlled.shutdown.enable=true
>
>
> Thanks in advance.
>
>
>
> Carles Sistare
>
>
>


-- 
http://khangaonkar.blogspot.com/

Reply via email to