Re: Amount of partitions

Helleren, Erik Tue, 08 Sep 2015 10:47:16 -0700

Jörg,
So, I will start with some assumptions I have which effect my suggestions
below.   I assume that the details you list are per cluster, and you have
3 clusters, one in each DC.  Each DC¹s cluster replicates its topic ONLY
to the other DC¹s (Mirror maker configuration, otherwise you have circular
replication and that causes problems).  And each topic has a replication
factor of 3.  I also assume that you have a large number of producers.

So, given those assumptions, lets try to figure out how to improve
performance and fix these issues.

First, understand your partitions and reduce them if possible.   From a
throughput perspective alone, having much more than 3 partitions with
identical message rates won¹t provide significant additional throughput
(partitions==brokers in cluster).  If you have moderate variance between
each partition's throughput, you might stop seeing additional performance
around 9, or maybe 18(partitions==brokers*( 3 or more)).  If partitions
are bursty and/or random in nature, a larger number of partitions might be
required but keep in mind that there is always an overhead for each
partition. The best way to measure message velocity is to take a look at
rate of change of the offsets for each partition.  If you notice a single
partition having excessive volume, that should be addressed. Knowing how
messages are being distributed across partitions is very important for
system design with kafka. (Disclaimer: there are other reasons to have
many partitions beyond producer/replication reasons.  But either way,
knowing the behavior of your partitions is important information.)

Second, Add more brokers to spread the load around.  When you have
brokers==replication factor, you will see similar system wide throughput
as a single broker with no replication factor (assumes producer acks==1)
but, since a lot of network bandwidth is used for replication from each
broker, you could easily max out your network.  So adding more brokers to
each cluster could give you both a throughput boost and a response time
drop.  The only downside to adding more brokers is cost.  But you could
drop the cost of your cluster in a few ways.  Mostly, you need to know how
much message volume you need to be in the head of your log?  If all your
processes are streaming applications, 1 min of worse case message volume
should be more than enough RAM to reserve for page cache for a large
recovery window.  Given your specs, and assuming an average message size
of 1kb, you would only need around 8GB beyond the baseline servers¹ memory
usage.  If you are consuming for a batch based operation (Hadoop MR, Spark
Jobs), I would accept the fact that you would have to go to disk at each
batch interval, but to quote the manual, "don¹t fear the file system!"
http://kafka.apache.org/documentation.html#persistence   If you are mixed,
using around 2x - 4 x that recovery window should be more than enough.

Third, consider reconfiguring your brokers.  Looking at the configs for a
broker, setting replica.fetch.max.bytes higher might help a bit to reduce
network overhead for replications.  Also, increasing num.replica.fetchers
should increase how many partitions each broker can follow.  Keep in mind,
given your current assumed setup, every node is receiving 100% of all
messages, and 66% of that is through the replica fetchers.  By comparison,
kafka defaults to 1 for num.replica.fetchers and 3 for regular network
requests. 

Fourth, make sure you configure and setup for mirror maker correctly.
(https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=27846330)
  Make sure that its not a circle of replication first!  Make sure that
your consumer configs are well setup for inter-DC communication
(http://kafka.apache.org/documentation.html#consumerconfigs)  This means
longer timeouts, larger fetch requests, more queued chunks
(queued.max.message.chunks), and a larger minimum fetch size.  For the
producer config, it depends on your requirements, so I can¹t speak as much
to them.  Most importantly, make sure your mirror maker cluster is close
to the target kafka cluster.  And, for good measure, look at inter-DC
bandwidth to make sure that you aren¹t maxing out your connection even
though that¹s very unlikely at 2k/min.

Just a good practice, Add more topics.  This won¹t effect performance
directly, but it might make administration and consumption easier in many
respects.  Unless all the messages in the DC represent the same message
flowŠ 
-Erik

On 9/7/15, 3:32 AM, "Jörg Wagner" <joerg.wagn...@1und1.de> wrote:

>Sorry, the messages are keyed.
>
>On 07.09.2015 10:08, Jörg Wagner wrote:
>> Thank you very much for both replies.
>>
>> @Tao
>> Thanks, I am aware of and have read that article. I am asking because
>> my experience is completely different :/. Everytime we go beyond 400
>> partitions the cluster really starts breaking apart.
>>
>> @Todd
>> Thank you, very informative.
>>
>> Or Details:
>> 3 Brokers: 192GB Ram, 27 Disks for log.dirs, 9 topics and estimated
>> 50k requests / second on 3 of the topics, the others are negligible.
>> Ordering is not required, messages are not keyed
>>
>> The 3 main topics are one per DC (3 DCs) and being mirrored to the
>> others.
>>
>>
>> The issue arises when we use over 400 partitions, which I think we
>> require due to performance and mirroring. Partitions get out of sync
>> and the log is being spammed by replicator messages. At the core, we
>> start having massive stability issues.
>>
>> Additionally, the mirrormaker only gets 2k Messages per *minute*
>> through with a stable setup of 81 partitions (for the 3 main topics).
>>
>> Has anyone experienced this and can give more insight? We have been
>> doing testing for weeks, compared configuration and setups, without
>> finding the main cause.
>> Can this be a Kernel (version/configuration) or Java(7) issue?
>>
>> Cheers
>> Jörg
>>
>>
>> On 04.09.2015 20:24, Todd Palino wrote:
>>> Jun's post is a good start, but I find it's easier to talk in terms
>>> of more
>>> concrete reasons and guidance for having fewer or more partitions per
>>> topic.
>>>
>>> Start with the number of brokers in the cluster. This is a good
>>>baseline
>>> for the minimum number of partitions in a topic, as it will assure
>>> balance
>>> over the cluster. Of course, if you have lots of topics, you can
>>> potentially skip past this as you'll end up with balanced load in the
>>> aggregate, but I think it's a good practice regardless. As with all
>>> other
>>> advice here, there are always exceptions. If you really, really, really
>>> need to assure ordering of messages, you might be stuck with a single
>>> partition for some use cases.
>>>
>>> In general, you should pick more partitions if a) the topic is very
>>> busy,
>>> or b) you have more consumers. Looking at the second case first, you
>>> always
>>> want to have at least as many partitions in a topic as you have
>>> individual
>>> consumers in a consumer group. So if you have 16 consumers in a single
>>> group, you will want the topic they consume to have at least 16
>>> partitions.
>>> In fact, you may also want to always have a multiple of the number of
>>> consumers so that you have even distribution. How many consumers you
>>> have
>>> in a group is going to be driven more by what you do with the
>>> messages once
>>> they are consumed, so here you'll be looking from the bottom of your
>>> stack
>>> up, until you get to Kafka.
>>>
>>> How busy the topic is is looking from the top down, through the
>>> producer,
>>> to Kafka. It's also a little more difficult to provide guidance on.
>>> We have
>>> a policy of expanding partitions for a topic whenever the size of the
>>> partition on disk (full retention over 4 days) is larger than 50 GB. We
>>> find that this gives us a few benefits. One is that it takes a
>>> reasonable
>>> amount of time when we need to move a partition from one broker to
>>> another.
>>> Another is that when we have partitions that are larger than this,
>>> the rate
>>> tends to cause problems with consumers. For example, we see mirror
>>>maker
>>> perform much better, and have less spiky lag problems, when we stay
>>> under
>>> this limit. We're even considering revising the limit down a little, as
>>> we've had some reports from other wildcard consumers that they've had
>>> problems keeping up with topics that have partitions larger than
>>> about 30
>>> GB.
>>>
>>> The last thing to look at is whether or not you are producing keyed
>>> messages to the topic. If you're working with unkeyed messages, there
>>> is no
>>> problem. You can usually add partitions whenever you want to down the
>>> road
>>> with little coordination with producers and consumers. If you are
>>> producing
>>> keyed messages, there is a good chance you do not want to change the
>>> distribution of keys to partitions at various points in the future
>>> when you
>>> need to size up. This means that when you first create the topic, you
>>> probably want to create it with enough partitions to deal with growth
>>> over
>>> time, both on the produce and consume side, even if that is too many
>>> partitions right now by other measures. For example, we have one
>>> client who
>>> requested 720 partitions for a particular set of topics. The
>>> reasoning was
>>> that they are producing keyed messages, they wanted to account for
>>> growth,
>>> and they wanted even distribution of the partitions to consumers as
>>>they
>>> grow. 720 happens to have a lot of factors, so it was a good number for
>>> them to pick.
>>>
>>> As a note, we have up to 5000 partitions per broker right now on
>>>current
>>> hardware, and we're moving to new hardware (more disk, 256 GB of
>>>memory,
>>> 10gig interfaces) where we're going to have up to 12,000. Our default
>>> partition count for most clusters is 8, and we've got topics up to 512
>>> partitions in some places just taking into account the produce rate
>>> alone
>>> (not counting those 720-partition topics that aren't that busy). Many
>>>of
>>> our brokers run with over 10k open file handles for regular files
>>>alone,
>>> and over 50k open when you include network.
>>>
>>> -Todd
>>>
>>>
>>>
>>> On Fri, Sep 4, 2015 at 8:11 AM, tao xiao <xiaotao...@gmail.com> wrote:
>>>
>>>> Here is a good doc to describe how to choose the right number of
>>>> partitions
>>>>
>>>>
>>>> 
>>>>http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitio
>>>>ns-in-a-kafka-cluster/
>>>>
>>>>
>>>> On Fri, Sep 4, 2015 at 10:08 PM, Jörg Wagner <joerg.wagn...@1und1.de>
>>>> wrote:
>>>>
>>>>> Hello!
>>>>>
>>>>> Regarding the recommended amount of partitions I am a bit confused.
>>>>> Basically I got the impression that it's better to have lots of
>>>> partitions
>>>>> (see information from linkedin etc). On the other hand, a lot of
>>>>> performance benchmarks floating around show only a few partitions are
>>>> being
>>>>> used.
>>>>>
>>>>> Especially when considering the difference between hdd and ssds and
>>>>> also
>>>>> the amount thereof, what is the way to go?
>>>>>
>>>>> In my case, I seem to have the best stability and performance
>>>>> issues with
>>>>> few partitions *per hdd*, and only one io thread per disk.
>>>>>
>>>>> What are your experiences and recommendations?
>>>>>
>>>>> Cheers
>>>>> Jörg
>>>>>
>>>>
>>>>
>>>> -- 
>>>> Regards,
>>>> Tao
>>>>
>>
>
>-- 
>Mit freundlichem Gruß
>
>Jörg Wagner
>
>  
>Mobile & Services
>
>1&1 Internet AG | Sapporobogen 6-8 | 80637 München | Germany
>Phone: +49 89 14339 324
>E-Mail: joerg.wagn...@1und1.de | Web: www.1und1.de
>
>Hauptsitz Montabaur, Amtsgericht Montabaur, HRB 6484
>
>Vorstand: Ralph Dommermuth, Frank Einhellinger, Robert Hoffmann, Andreas
>Hofmann, Markus Huhn, Hans-Henning Kettler, Uwe Lamnek, Jan Oetjen,
>Christian Würst
>Aufsichtsratsvorsitzender: Michael Scheeren
>
>Member of United Internet
>

Re: Amount of partitions

Reply via email to