Re: Topics, partitions and keys

Todd Palino Sun, 29 May 2016 17:53:04 -0700

Answers are in-line below.

-Todd

On Sun, May 29, 2016 at 3:00 PM, Igor Kravzov <igork.ine...@gmail.com>
wrote:

> Please help me with the subject.
> In Kafka documentations I found the following:
>
> *Kafka only provides a total order over messages within a partition, not
> between different partitions in a topic. Per-partition ordering combined
> with the ability to partition data by key is sufficient for most
> applications. However, if you require a total order over messages this can
> be achieved with a topic that has only one partition, though this will mean
> only one consumer process per consumer group.*
>
> So here are my questions:
> 1. Does it mean if i want to have more than 1 consumer (from the same
> group) reading from the same topic I need to have more than 1 partition?
>

Yes

> 2. Does it mean that I need the same amount of partitions as amount of
> consumers for the same group?
>

If you want all your consumers to be actively consuming, then you need at
least as many partitions as you have consumers. You can have more
partitions than you do consumers, and you can be assured that all
partitions will be consumed. Note that you don’t have to have as many
partitions as you do consumers - you could have some “warm spares” in place
to pick up if others drop out.

> 3. How many consumers can read from one partition?
>

Only one consumer in a given consumer group can read from a single
partition. You can have as many consumer groups reading the same topic as
your system is able to handle (i.e. how much network bandwidth do you have,
for example).

> Also have some questions regarding relationship between keys and partitions
> with regard to API. I only looked at .net APIs (especially one from MS)
>  but looks like the mimic Java API.
> Whhen using a producer to send a message to a topic there is a key
> parameter. But when consumer reads from a topic there is a partition
> number.
>
> 1. How are partitions numbered? Starting from 0 or 1?
>

Starting from zero. A topic with eight partitions will have them numbered
zero through seven.

> 2. What exactly relationship between a key and partition?
> As I understand some function on key will determine a partition. is that
> correct?
>

When using keys, the default partitioner will hash the key and write it to
a partition based on the hash. All messages produced with that same key
will be written to the same partition. This is useful, for example, if you
want to make sure all messages that deal with a particular user are in the
same partition to optimize processing on your consumers.

Note that the hashing and partition assignment will be consistent as long
as the number of partitions remains the same. If you change the number of
partitions for the topic, the assignments of keys to partitions will change.

> 3. If I have 2 partitions in a topic and want some particular messages go
> to one partition and other messages go to another I should use a specific
> key for one specific partition, and the rest for another?
>

For 2 partitions, it is likely, but not guaranteed, that if you produce
messages with 2 different keys they will end up in different partitions. If
you truly want to assure partitioning like that, you will need to use a
custom partitioner. This will allow you to specify, in the producer,
exactly what partition each message gets produced to.

> 4. What if I have 3 partitions and one type of messages to go to one
> particular partition and the rest to other two?
>

Again, this is a case where you need to use a custom partitioner if you
want to get fancy.

> 5. How in general I send messages to a particular partition in order to
> know  for a consumer from where to read?
> Or I better off with multiple topics?
>

Either way is OK. I usually use the guideline that if the messages are of
different types (e.g. one is a page view event, and one is a search event),
they should probably be in different topics named appropriately. This
allows your consumers to know exactly what they are dealing with, and you
never know when you’ll have a consumer that will care about one and not the
other. You want to minimize a consumer reading messages and throwing them
away because they’re not what it wants to be reading.

Doing something like having a topic per-user can be very problematic as the
scale starts to increase. Yes, you can certainly use a wildcard consumer,
but if you’re not doing that you have to maintain some mapping of consumers
to topics. And if you are using a wildcard consumer, you’re going to run
into issues with the number of topics any given group is consuming at some
point. Your system may work fine for 5 topics, but what about when it grows
to 100? 1000? A million?

>
> Thanks in advance.
>

*—-*
*Todd Palino*
Staff Site Reliability Engineer
Data Infrastructure Streaming

linkedin.com/in/toddpalino

Re: Topics, partitions and keys

Reply via email to