It may be a minority, I can't tell yet. But in some apps we need to know that a consumer, who is assigned a single partition, will get all data about a subset of users. This is way more flexible than multiple topics since we still have the benefits of partition reassignment, load balancing between consumers, fault protection, etc.
— Sent from Mailbox On Thu, Oct 16, 2014 at 9:52 AM, Kyle Banker <kyleban...@gmail.com> wrote: > I didn't realize that anyone used partitions to logically divide a topic. > When would that be preferable to simply having a separate topic? Isn't this > a minority case? > On Thu, Oct 16, 2014 at 7:28 AM, Gwen Shapira <gshap...@cloudera.com> wrote: >> Just note that this is not a universal solution. Many use-cases care >> about which partition you end up writing to since partitions are used >> to... well, partition logical entities such as customers and users. >> >> >> >> On Wed, Oct 15, 2014 at 9:03 PM, Jun Rao <jun...@gmail.com> wrote: >> > Kyle, >> > >> > What you wanted is not supported out of box. You can achieve this using >> the >> > new java producer. The new java producer allows you to pick an arbitrary >> > partition when sending a message. If you receive >> NotEnoughReplicasException >> > when sending a message, you can resend it to another partition. >> > >> > Thanks, >> > >> > Jun >> > >> > On Tue, Oct 14, 2014 at 1:51 PM, Kyle Banker <kyleban...@gmail.com> >> wrote: >> > >> >> Consider a 12-node Kafka cluster with a 200-parition topic having a >> >> replication factor of 3. Let's assume, in addition, that we're running >> >> Kafka v0.8.2, we've disabled unclean leader election, acks is -1, and >> >> min.isr is 2. >> >> >> >> Now suppose we lose 2 nodes. In this case, there's a good chance that >> 2/3 >> >> replicas of one or more partitions will be unavailable. This means that >> >> messages assigned to those partitions will not be writable. If we're >> >> writing a large number of messages, I would expect that all producers >> would >> >> eventually halt. It is somewhat surprising that, if we rely on a basic >> >> durability setting, the cluster would likely be unavailable even after >> >> losing only 2 / 12 nodes. >> >> >> >> It might be useful in this scenario for the producer to be able to >> detect >> >> which partitions are no longer available and reroute messages that would >> >> have hashed to the unavailable partitions (as defined by our acks and >> >> min.isr settings). This way, the cluster as a whole would remain >> available >> >> for writes at the cost of a slightly higher load on the remaining >> machines. >> >> >> >> Is this limitation accurately described? Is the proposed producer >> >> functionality worth pursuing? >> >> >>