Thank you Todd for great response. On Sun, May 29, 2016 at 8:52 PM, Todd Palino <tpal...@gmail.com> wrote:
> Answers are in-line below. > > -Todd > > On Sun, May 29, 2016 at 3:00 PM, Igor Kravzov <igork.ine...@gmail.com> > wrote: > > > Please help me with the subject. > > In Kafka documentations I found the following: > > > > *Kafka only provides a total order over messages within a partition, not > > between different partitions in a topic. Per-partition ordering combined > > with the ability to partition data by key is sufficient for most > > applications. However, if you require a total order over messages this > can > > be achieved with a topic that has only one partition, though this will > mean > > only one consumer process per consumer group.* > > > > So here are my questions: > > 1. Does it mean if i want to have more than 1 consumer (from the same > > group) reading from the same topic I need to have more than 1 partition? > > > > Yes > > > > 2. Does it mean that I need the same amount of partitions as amount of > > consumers for the same group? > > > > If you want all your consumers to be actively consuming, then you need at > least as many partitions as you have consumers. You can have more > partitions than you do consumers, and you can be assured that all > partitions will be consumed. Note that you don’t have to have as many > partitions as you do consumers - you could have some “warm spares” in place > to pick up if others drop out. > > > > 3. How many consumers can read from one partition? > > > > Only one consumer in a given consumer group can read from a single > partition. You can have as many consumer groups reading the same topic as > your system is able to handle (i.e. how much network bandwidth do you have, > for example). > > > > Also have some questions regarding relationship between keys and > partitions > > with regard to API. I only looked at .net APIs (especially one from MS) > > but looks like the mimic Java API. > > Whhen using a producer to send a message to a topic there is a key > > parameter. But when consumer reads from a topic there is a partition > > number. > > > > 1. How are partitions numbered? Starting from 0 or 1? > > > > Starting from zero. A topic with eight partitions will have them numbered > zero through seven. > > > > 2. What exactly relationship between a key and partition? > > As I understand some function on key will determine a partition. is that > > correct? > > > > When using keys, the default partitioner will hash the key and write it to > a partition based on the hash. All messages produced with that same key > will be written to the same partition. This is useful, for example, if you > want to make sure all messages that deal with a particular user are in the > same partition to optimize processing on your consumers. > > Note that the hashing and partition assignment will be consistent as long > as the number of partitions remains the same. If you change the number of > partitions for the topic, the assignments of keys to partitions will > change. > > > > 3. If I have 2 partitions in a topic and want some particular messages go > > to one partition and other messages go to another I should use a specific > > key for one specific partition, and the rest for another? > > > > For 2 partitions, it is likely, but not guaranteed, that if you produce > messages with 2 different keys they will end up in different partitions. If > you truly want to assure partitioning like that, you will need to use a > custom partitioner. This will allow you to specify, in the producer, > exactly what partition each message gets produced to. > > > > 4. What if I have 3 partitions and one type of messages to go to one > > particular partition and the rest to other two? > > > > Again, this is a case where you need to use a custom partitioner if you > want to get fancy. > > > > 5. How in general I send messages to a particular partition in order to > > know for a consumer from where to read? > > Or I better off with multiple topics? > > > > Either way is OK. I usually use the guideline that if the messages are of > different types (e.g. one is a page view event, and one is a search event), > they should probably be in different topics named appropriately. This > allows your consumers to know exactly what they are dealing with, and you > never know when you’ll have a consumer that will care about one and not the > other. You want to minimize a consumer reading messages and throwing them > away because they’re not what it wants to be reading. > > Doing something like having a topic per-user can be very problematic as the > scale starts to increase. Yes, you can certainly use a wildcard consumer, > but if you’re not doing that you have to maintain some mapping of consumers > to topics. And if you are using a wildcard consumer, you’re going to run > into issues with the number of topics any given group is consuming at some > point. Your system may work fine for 5 topics, but what about when it grows > to 100? 1000? A million? > > > > > > Thanks in advance. > > > > > > *—-* > *Todd Palino* > Staff Site Reliability Engineer > Data Infrastructure Streaming > > > > linkedin.com/in/toddpalino >