How large are your messages compressed? 50k requests/sec could equate to as little as 50 KB/sec of traffic per topic, 50 GB/sec, or more. The size of the messages is going to be pretty important when considering overall throughput here. Additionally, what kind of network interfaces are you using on your brokers?
If you're working with 400 partitions per topic, that's around 3600 partitions total, or about 2400 partitions per broker (assuming replication factor 2). This isn't an unreasonable number of partitions per-broker. We regularly run twice as many partitions as this with only 64 GB of memory. I don't think partition count is your problem here. Depending on the message size, you could easily be doing too much. Let's assume a message size of 2 KB. Given that you have 3 topics doing 50k req/sec, that would be 100 MB/sec per topic. With 3 brokers, I'm going to assume that you're running replication factor 2. This means that each broker has an inbound traffic rate equivalent to 2 of those topics, or 200 MB/sec. Already, this means that you had better be running something more than a gigabit network interface. If not, I can pin down your first problem right there. Given that you're seeing a lot of replication messages in the logs, I would start here by adding more brokers and balancing out your traffic further. Each of your brokers also has an outbound traffic rate of 100 MB/sec just for the inter-broker replication traffic. You're running a mirror maker, which means that you have to double that. So you're doing 200 MB/sec inbound, and 200 MB/sec outbound from each broker. How many copies of mirror maker are you running in your consumer group? If you're only running 1 copy of mirror maker, that means that single box has to handle 300 MB/sec of inbound and outbound traffic, and it also has to decompress and recompress all of that traffic. My experience has been that a single thread of mirror maker (1 server, running with 1 consumer thread) will max out around 12 MB/sec. You're not too far off, especially if you have problems on the broker side that is reducing the efficiency of your system. I don't think you're looking at anything specific to the kernel version or the JDK at this point, I think you may just be underpowered (depending on the answers to the questions buried in there). -Todd On Mon, Sep 7, 2015 at 1:08 AM, Jörg Wagner <joerg.wagn...@1und1.de> wrote: > Thank you very much for both replies. > > @Tao > Thanks, I am aware of and have read that article. I am asking because my > experience is completely different :/. Everytime we go beyond 400 > partitions the cluster really starts breaking apart. > > @Todd > Thank you, very informative. > > Or Details: > 3 Brokers: 192GB Ram, 27 Disks for log.dirs, 9 topics and estimated 50k > requests / second on 3 of the topics, the others are negligible. > Ordering is not required, messages are not keyed > > The 3 main topics are one per DC (3 DCs) and being mirrored to the others. > > > The issue arises when we use over 400 partitions, which I think we require > due to performance and mirroring. Partitions get out of sync and the log is > being spammed by replicator messages. At the core, we start having massive > stability issues. > > Additionally, the mirrormaker only gets 2k Messages per *minute* through > with a stable setup of 81 partitions (for the 3 main topics). > > Has anyone experienced this and can give more insight? We have been doing > testing for weeks, compared configuration and setups, without finding the > main cause. > Can this be a Kernel (version/configuration) or Java(7) issue? > > Cheers > Jörg > > > > On 04.09.2015 20:24, Todd Palino wrote: > >> Jun's post is a good start, but I find it's easier to talk in terms of >> more >> concrete reasons and guidance for having fewer or more partitions per >> topic. >> >> Start with the number of brokers in the cluster. This is a good baseline >> for the minimum number of partitions in a topic, as it will assure balance >> over the cluster. Of course, if you have lots of topics, you can >> potentially skip past this as you'll end up with balanced load in the >> aggregate, but I think it's a good practice regardless. As with all other >> advice here, there are always exceptions. If you really, really, really >> need to assure ordering of messages, you might be stuck with a single >> partition for some use cases. >> >> In general, you should pick more partitions if a) the topic is very busy, >> or b) you have more consumers. Looking at the second case first, you >> always >> want to have at least as many partitions in a topic as you have individual >> consumers in a consumer group. So if you have 16 consumers in a single >> group, you will want the topic they consume to have at least 16 >> partitions. >> In fact, you may also want to always have a multiple of the number of >> consumers so that you have even distribution. How many consumers you have >> in a group is going to be driven more by what you do with the messages >> once >> they are consumed, so here you'll be looking from the bottom of your stack >> up, until you get to Kafka. >> >> How busy the topic is is looking from the top down, through the producer, >> to Kafka. It's also a little more difficult to provide guidance on. We >> have >> a policy of expanding partitions for a topic whenever the size of the >> partition on disk (full retention over 4 days) is larger than 50 GB. We >> find that this gives us a few benefits. One is that it takes a reasonable >> amount of time when we need to move a partition from one broker to >> another. >> Another is that when we have partitions that are larger than this, the >> rate >> tends to cause problems with consumers. For example, we see mirror maker >> perform much better, and have less spiky lag problems, when we stay under >> this limit. We're even considering revising the limit down a little, as >> we've had some reports from other wildcard consumers that they've had >> problems keeping up with topics that have partitions larger than about 30 >> GB. >> >> The last thing to look at is whether or not you are producing keyed >> messages to the topic. If you're working with unkeyed messages, there is >> no >> problem. You can usually add partitions whenever you want to down the road >> with little coordination with producers and consumers. If you are >> producing >> keyed messages, there is a good chance you do not want to change the >> distribution of keys to partitions at various points in the future when >> you >> need to size up. This means that when you first create the topic, you >> probably want to create it with enough partitions to deal with growth over >> time, both on the produce and consume side, even if that is too many >> partitions right now by other measures. For example, we have one client >> who >> requested 720 partitions for a particular set of topics. The reasoning was >> that they are producing keyed messages, they wanted to account for growth, >> and they wanted even distribution of the partitions to consumers as they >> grow. 720 happens to have a lot of factors, so it was a good number for >> them to pick. >> >> As a note, we have up to 5000 partitions per broker right now on current >> hardware, and we're moving to new hardware (more disk, 256 GB of memory, >> 10gig interfaces) where we're going to have up to 12,000. Our default >> partition count for most clusters is 8, and we've got topics up to 512 >> partitions in some places just taking into account the produce rate alone >> (not counting those 720-partition topics that aren't that busy). Many of >> our brokers run with over 10k open file handles for regular files alone, >> and over 50k open when you include network. >> >> -Todd >> >> >> >> On Fri, Sep 4, 2015 at 8:11 AM, tao xiao <xiaotao...@gmail.com> wrote: >> >> Here is a good doc to describe how to choose the right number of >>> partitions >>> >>> >>> >>> http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/ >>> >>> On Fri, Sep 4, 2015 at 10:08 PM, Jörg Wagner <joerg.wagn...@1und1.de> >>> wrote: >>> >>> Hello! >>>> >>>> Regarding the recommended amount of partitions I am a bit confused. >>>> Basically I got the impression that it's better to have lots of >>>> >>> partitions >>> >>>> (see information from linkedin etc). On the other hand, a lot of >>>> performance benchmarks floating around show only a few partitions are >>>> >>> being >>> >>>> used. >>>> >>>> Especially when considering the difference between hdd and ssds and also >>>> the amount thereof, what is the way to go? >>>> >>>> In my case, I seem to have the best stability and performance issues >>>> with >>>> few partitions *per hdd*, and only one io thread per disk. >>>> >>>> What are your experiences and recommendations? >>>> >>>> Cheers >>>> Jörg >>>> >>>> >>> >>> -- >>> Regards, >>> Tao >>> >>> > -- > Mit freundlichem Gruß > > Jörg Wagner > > Mobile & Services > > 1&1 Internet AG | Sapporobogen 6-8 | 80637 München | Germany > Phone: +49 89 14339 324 > E-Mail: joerg.wagn...@1und1.de | Web: www.1und1.de > > Hauptsitz Montabaur, Amtsgericht Montabaur, HRB 6484 > > Vorstand: Ralph Dommermuth, Frank Einhellinger, Robert Hoffmann, Andreas > Hofmann, Markus Huhn, Hans-Henning Kettler, Uwe Lamnek, Jan Oetjen, > Christian Würst > Aufsichtsratsvorsitzender: Michael Scheeren > > Member of United Internet > >