Hi Carles Just check in, did you do the benchmarks eventually on your end? Curious to know the data points.
Chen On Thu, May 21, 2015 at 2:47 PM, Carles Sistare < carles.sist...@googlemail.com> wrote: > Thanks a lot guys for your answers, > We’ll be doing some benchmarks comparing different amount of partitions > for the same load. We’ll share the results. > > Cheers > > > On 21 May 2015, at 06:04, Saladi Naidu <naidusp2...@yahoo.com.INVALID> > wrote: > > > > In general partitions are to improve throughput by parallelism. From > your explanation below yes partitions are written to different physical > locations but still append only. With write ahead buffering and append only > writes, having partitions still will increase throughput. > > Below is an excellent article by Jun Rao about partitions and its impact > > How to choose the number of topics/partitions in a Kafka cluster? > > > > | | > > | | | | | | | | > > | How to choose the number of topics/partitions in a Kafka...This is a > common question asked by many Kafka users. The goal of this post is to > explain a few important determining factors and provide a few simple > formulas. More... | > > | | > > | View on blog.confluent.io | Preview by Yahoo | > > | | > > | | > > > > Naidu Saladi > > > > From: Daniel Compton <daniel.compton.li...@gmail.com> > > To: users@kafka.apache.org > > Sent: Wednesday, May 20, 2015 8:21 PM > > Subject: Re: Optimal number of partitions for topic > > > > One of the beautiful things about Kafka is that it uses the disk and OS > > disk caching really efficiently. > > > > Because Kafka writes messages to a contiguous log, it needs very little > > seek time to move the write head to the next point. Similarly for > reading, > > if the consumers are mostly up to date with the topic then the disk head > > will be close to the reading point, or the data will already be in disk > > cache and can be read from memory. Even cooler, because disk access is so > > predictable, the OS can easily prefetch data because it knows what you're > > going to ask for in 50ms. > > > > In summary, data locality rocks. > > > > What I haven't worked out is how the data locality benefits interact with > > having multiple partitions. I would assume that this would make things > > slower because you will be writing to multiple physical locations on > disk, > > though this may be ameliorated by only fsyncing every n seconds. > > > > I'd be keen to hear how this impacts it, or even better to see some > > benchmarks. > > > > -- > > Daniel. > > > > > > > > > > On Thu, 21 May 2015 at 12:56 pm Manoj Khangaonkar <khangaon...@gmail.com > > > > wrote: > > > >> With knowing the actual implementation details, I would get guess more > >> partitions implies more parallelism, more concurrency, more threads, > more > >> files to write to - all of which will contribute to more CPU load. > >> > >> Partitions allow you to scale by partitioning the topic across multiple > >> brokers. Partition is also a unit of replication ( 1 leader + replicas > ). > >> And for consumption of messages, the order is maintained within a > >> partitions. > >> > >> But if you put 100 partitions per topic on 1 single broker, I wonder if > it > >> is going to be an overhead. > >> > >> > >> > >> On Wed, May 20, 2015 at 1:02 AM, Carles Sistare <car...@ogury.co> > wrote: > >> > >>> Hi, > >>> We are implementing a Kafka cluster with 9 brokers into EC2 instances, > >> and > >>> we are trying to find out the optimal number of partitions for our > >> topics, > >>> finding out the maximal number in order not to update the partition > >> number > >>> anymore. > >>> What we understood is that the number of partitions shouldn’t affect > the > >>> CPU load of the brokers, but when we add 512 partitions instead of 128, > >> for > >>> instance, the CPU load exploses. > >>> We have three topics with 100000 messages/sec each, a replication > factor > >>> of 3 and two consumer groups for each partition. > >>> > >>> Could somebody explain, why the increase of the number of partitions > has > >> a > >>> so dramatic impact to the CPU load? > >>> > >>> > >>> Here under i paste the config file of kafka: > >>> > >>> broker.id=3 > >>> > >>> default.replication.factor=3 > >>> > >>> > >>> # The port the socket server listens on > >>> port=9092 > >>> > >>> # The number of threads handling network requests > >>> num.network.threads=2 > >>> > >>> # The number of threads doing disk I/O > >>> num.io.threads=8 > >>> > >>> # The send buffer (SO_SNDBUF) used by the socket server > >>> socket.send.buffer.bytes=1048576 > >>> > >>> # The receive buffer (SO_RCVBUF) used by the socket server > >>> socket.receive.buffer.bytes=1048576 > >>> > >>> # The maximum size of a request that the socket server will accept > >>> (protection against OOM) > >>> socket.request.max.bytes=104857600 > >>> > >>> > >>> > >>> # A comma seperated list of directories under which to store log files > >>> log.dirs=/mnt/kafka-logs > >>> > >>> # The default number of log partitions per topic. More partitions allow > >>> greater > >>> # parallelism for consumption, but this will also result in more files > >>> across > >>> # the brokers. > >>> num.partitions=16 > >>> > >>> # The minimum age of a log file to be eligible for deletion > >>> log.retention.hours=1 > >>> > >>> # The maximum size of a log segment file. When this size is reached a > new > >>> log segment will be created. > >>> log.segment.bytes=536870912 > >>> > >>> # The interval at which log segments are checked to see if they can be > >>> deleted according > >>> # to the retention policies > >>> log.retention.check.interval.ms=60000 > >>> > >>> # By default the log cleaner is disabled and the log retention policy > >> will > >>> default to just delete segments after their retention expires. > >>> # If log.cleaner.enable=true is set the cleaner will be enabled and > >>> individual logs can then be marked for log compaction. > >>> log.cleaner.enable=false > >>> > >>> # Timeout in ms for connecting to zookeeper > >>> zookeeper.connection.timeout.ms=1000000 > >>> > >>> auto.leader.rebalance.enable=true > >>> controlled.shutdown.enable=true > >>> > >>> > >>> Thanks in advance. > >>> > >>> > >>> > >>> Carles Sistare > >>> > >>> > >>> > >> > >> > >> -- > >> http://khangaonkar.blogspot.com/ > >> > > > > -- Chen Song