Knowing that the partitioning is consistent for a given key means that
(apart from other benefits) a given consumer only deals with a partition of
the keyspace. So if you are in a system with tens of millions of users each
consumer only has to store state on a small number of them with
inconsistent partitioning each consumer would have to be able to handle all
of the users. This could just be storing a bit of data for each user or
something much more complicated. You may not care which consumer a given
user ends up on, just that they don't end up on more than one for long
periods of time.

Christian

On Thu, Oct 16, 2014 at 8:20 AM, <gshap...@cloudera.com> wrote:

> It may be a minority,  I can't tell yet. But in some apps we need to know
> that a consumer, who is assigned a single partition, will get all data
> about a subset of users.
> This is way more flexible than multiple topics since we still have the
> benefits of partition reassignment,  load balancing between consumers,
> fault protection,  etc.
>
> —
> Sent from Mailbox
>
> On Thu, Oct 16, 2014 at 9:52 AM, Kyle Banker <kyleban...@gmail.com> wrote:
>
> > I didn't realize that anyone used partitions to logically divide a topic.
> > When would that be preferable to simply having a separate topic? Isn't
> this
> > a minority case?
> > On Thu, Oct 16, 2014 at 7:28 AM, Gwen Shapira <gshap...@cloudera.com>
> wrote:
> >> Just note that this is not  a universal solution. Many use-cases care
> >> about which partition you end up writing to since partitions are used
> >> to... well, partition logical entities such as customers and users.
> >>
> >>
> >>
> >> On Wed, Oct 15, 2014 at 9:03 PM, Jun Rao <jun...@gmail.com> wrote:
> >> > Kyle,
> >> >
> >> > What you wanted is not supported out of box. You can achieve this
> using
> >> the
> >> > new java producer. The new java producer allows you to pick an
> arbitrary
> >> > partition when sending a message. If you receive
> >> NotEnoughReplicasException
> >> > when sending a message, you can resend it to another partition.
> >> >
> >> > Thanks,
> >> >
> >> > Jun
> >> >
> >> > On Tue, Oct 14, 2014 at 1:51 PM, Kyle Banker <kyleban...@gmail.com>
> >> wrote:
> >> >
> >> >> Consider a 12-node Kafka cluster with a 200-parition topic having a
> >> >> replication factor of 3. Let's assume, in addition, that we're
> running
> >> >> Kafka v0.8.2, we've disabled unclean leader election, acks is -1, and
> >> >> min.isr is 2.
> >> >>
> >> >> Now suppose we lose 2 nodes. In this case, there's a good chance that
> >> 2/3
> >> >> replicas of one or more partitions will be unavailable. This means
> that
> >> >> messages assigned to those partitions will not be writable. If we're
> >> >> writing a large number of messages, I would expect that all producers
> >> would
> >> >> eventually halt. It is somewhat surprising that, if we rely on a
> basic
> >> >> durability setting, the cluster would likely be unavailable even
> after
> >> >> losing only 2 / 12 nodes.
> >> >>
> >> >> It might be useful in this scenario for the producer to be able to
> >> detect
> >> >> which partitions are no longer available and reroute messages that
> would
> >> >> have hashed to the unavailable partitions (as defined by our acks and
> >> >> min.isr settings). This way, the cluster as a whole would remain
> >> available
> >> >> for writes at the cost of a slightly higher load on the remaining
> >> machines.
> >> >>
> >> >> Is this limitation accurately described? Is the proposed producer
> >> >> functionality worth pursuing?
> >> >>
> >>
>

Reply via email to