Re: Topics, partitions and keys

Igor Kravzov Mon, 30 May 2016 07:19:20 -0700

Thank you Todd for great response.

On Sun, May 29, 2016 at 8:52 PM, Todd Palino <tpal...@gmail.com> wrote:


> Answers are in-line below.
>
> -Todd
>
> On Sun, May 29, 2016 at 3:00 PM, Igor Kravzov <igork.ine...@gmail.com>
> wrote:
>
> > Please help me with the subject.
> > In Kafka documentations I found the following:
> >
> > *Kafka only provides a total order over messages within a partition, not
> > between different partitions in a topic. Per-partition ordering combined
> > with the ability to partition data by key is sufficient for most
> > applications. However, if you require a total order over messages this
> can
> > be achieved with a topic that has only one partition, though this will
> mean
> > only one consumer process per consumer group.*
> >
> > So here are my questions:
> > 1. Does it mean if i want to have more than 1 consumer (from the same
> > group) reading from the same topic I need to have more than 1 partition?
> >
>
> Yes
>
>
> > 2. Does it mean that I need the same amount of partitions as amount of
> > consumers for the same group?
> >
>
> If you want all your consumers to be actively consuming, then you need at
> least as many partitions as you have consumers. You can have more
> partitions than you do consumers, and you can be assured that all
> partitions will be consumed. Note that you don’t have to have as many
> partitions as you do consumers - you could have some “warm spares” in place
> to pick up if others drop out.
>
>
> > 3. How many consumers can read from one partition?
> >
>
> Only one consumer in a given consumer group can read from a single
> partition. You can have as many consumer groups reading the same topic as
> your system is able to handle (i.e. how much network bandwidth do you have,
> for example).
>
>
> > Also have some questions regarding relationship between keys and
> partitions
> > with regard to API. I only looked at .net APIs (especially one from MS)
> >  but looks like the mimic Java API.
> > Whhen using a producer to send a message to a topic there is a key
> > parameter. But when consumer reads from a topic there is a partition
> > number.
> >
> > 1. How are partitions numbered? Starting from 0 or 1?
> >
>
> Starting from zero. A topic with eight partitions will have them numbered
> zero through seven.
>
>
> > 2. What exactly relationship between a key and partition?
> > As I understand some function on key will determine a partition. is that
> > correct?
> >
>
> When using keys, the default partitioner will hash the key and write it to
> a partition based on the hash. All messages produced with that same key
> will be written to the same partition. This is useful, for example, if you
> want to make sure all messages that deal with a particular user are in the
> same partition to optimize processing on your consumers.
>
> Note that the hashing and partition assignment will be consistent as long
> as the number of partitions remains the same. If you change the number of
> partitions for the topic, the assignments of keys to partitions will
> change.
>
>
> > 3. If I have 2 partitions in a topic and want some particular messages go
> > to one partition and other messages go to another I should use a specific
> > key for one specific partition, and the rest for another?
> >
>
> For 2 partitions, it is likely, but not guaranteed, that if you produce
> messages with 2 different keys they will end up in different partitions. If
> you truly want to assure partitioning like that, you will need to use a
> custom partitioner. This will allow you to specify, in the producer,
> exactly what partition each message gets produced to.
>
>
> > 4. What if I have 3 partitions and one type of messages to go to one
> > particular partition and the rest to other two?
> >
>
> Again, this is a case where you need to use a custom partitioner if you
> want to get fancy.
>
>
> > 5. How in general I send messages to a particular partition in order to
> > know  for a consumer from where to read?
> > Or I better off with multiple topics?
> >
>
> Either way is OK. I usually use the guideline that if the messages are of
> different types (e.g. one is a page view event, and one is a search event),
> they should probably be in different topics named appropriately. This
> allows your consumers to know exactly what they are dealing with, and you
> never know when you’ll have a consumer that will care about one and not the
> other. You want to minimize a consumer reading messages and throwing them
> away because they’re not what it wants to be reading.
>
> Doing something like having a topic per-user can be very problematic as the
> scale starts to increase. Yes, you can certainly use a wildcard consumer,
> but if you’re not doing that you have to maintain some mapping of consumers
> to topics. And if you are using a wildcard consumer, you’re going to run
> into issues with the number of topics any given group is consuming at some
> point. Your system may work fine for 5 topics, but what about when it grows
> to 100? 1000? A million?
>
>
> >
> > Thanks in advance.
> >
>
>
>
> *—-*
> *Todd Palino*
> Staff Site Reliability Engineer
> Data Infrastructure Streaming
>
>
>
> linkedin.com/in/toddpalino
>

Re: Topics, partitions and keys

Reply via email to