Knowing that the partitioning is consistent for a given key means that (apart from other benefits) a given consumer only deals with a partition of the keyspace. So if you are in a system with tens of millions of users each consumer only has to store state on a small number of them with inconsistent partitioning each consumer would have to be able to handle all of the users. This could just be storing a bit of data for each user or something much more complicated. You may not care which consumer a given user ends up on, just that they don't end up on more than one for long periods of time.
Christian On Thu, Oct 16, 2014 at 8:20 AM, <gshap...@cloudera.com> wrote: > It may be a minority, I can't tell yet. But in some apps we need to know > that a consumer, who is assigned a single partition, will get all data > about a subset of users. > This is way more flexible than multiple topics since we still have the > benefits of partition reassignment, load balancing between consumers, > fault protection, etc. > > — > Sent from Mailbox > > On Thu, Oct 16, 2014 at 9:52 AM, Kyle Banker <kyleban...@gmail.com> wrote: > > > I didn't realize that anyone used partitions to logically divide a topic. > > When would that be preferable to simply having a separate topic? Isn't > this > > a minority case? > > On Thu, Oct 16, 2014 at 7:28 AM, Gwen Shapira <gshap...@cloudera.com> > wrote: > >> Just note that this is not a universal solution. Many use-cases care > >> about which partition you end up writing to since partitions are used > >> to... well, partition logical entities such as customers and users. > >> > >> > >> > >> On Wed, Oct 15, 2014 at 9:03 PM, Jun Rao <jun...@gmail.com> wrote: > >> > Kyle, > >> > > >> > What you wanted is not supported out of box. You can achieve this > using > >> the > >> > new java producer. The new java producer allows you to pick an > arbitrary > >> > partition when sending a message. If you receive > >> NotEnoughReplicasException > >> > when sending a message, you can resend it to another partition. > >> > > >> > Thanks, > >> > > >> > Jun > >> > > >> > On Tue, Oct 14, 2014 at 1:51 PM, Kyle Banker <kyleban...@gmail.com> > >> wrote: > >> > > >> >> Consider a 12-node Kafka cluster with a 200-parition topic having a > >> >> replication factor of 3. Let's assume, in addition, that we're > running > >> >> Kafka v0.8.2, we've disabled unclean leader election, acks is -1, and > >> >> min.isr is 2. > >> >> > >> >> Now suppose we lose 2 nodes. In this case, there's a good chance that > >> 2/3 > >> >> replicas of one or more partitions will be unavailable. This means > that > >> >> messages assigned to those partitions will not be writable. If we're > >> >> writing a large number of messages, I would expect that all producers > >> would > >> >> eventually halt. It is somewhat surprising that, if we rely on a > basic > >> >> durability setting, the cluster would likely be unavailable even > after > >> >> losing only 2 / 12 nodes. > >> >> > >> >> It might be useful in this scenario for the producer to be able to > >> detect > >> >> which partitions are no longer available and reroute messages that > would > >> >> have hashed to the unavailable partitions (as defined by our acks and > >> >> min.isr settings). This way, the cluster as a whole would remain > >> available > >> >> for writes at the cost of a slightly higher load on the remaining > >> machines. > >> >> > >> >> Is this limitation accurately described? Is the proposed producer > >> >> functionality worth pursuing? > >> >> > >> >