Re: Consistency and Availability on Node Failures

gshapira Thu, 16 Oct 2014 08:21:29 -0700

It may be a minority,  I can't tell yet. But in some apps we need to know that 
a consumer, who is assigned a single partition, will get all data about a 
subset of users.
This is way more flexible than multiple topics since we still have the benefits 
of partition reassignment,  load balancing between consumers, fault protection, 
 etc.


—
Sent from Mailbox

On Thu, Oct 16, 2014 at 9:52 AM, Kyle Banker <kyleban...@gmail.com> wrote:

> I didn't realize that anyone used partitions to logically divide a topic.
> When would that be preferable to simply having a separate topic? Isn't this
> a minority case?
> On Thu, Oct 16, 2014 at 7:28 AM, Gwen Shapira <gshap...@cloudera.com> wrote:
>> Just note that this is not  a universal solution. Many use-cases care
>> about which partition you end up writing to since partitions are used
>> to... well, partition logical entities such as customers and users.
>>
>>
>>
>> On Wed, Oct 15, 2014 at 9:03 PM, Jun Rao <jun...@gmail.com> wrote:
>> > Kyle,
>> >
>> > What you wanted is not supported out of box. You can achieve this using
>> the
>> > new java producer. The new java producer allows you to pick an arbitrary
>> > partition when sending a message. If you receive
>> NotEnoughReplicasException
>> > when sending a message, you can resend it to another partition.
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> > On Tue, Oct 14, 2014 at 1:51 PM, Kyle Banker <kyleban...@gmail.com>
>> wrote:
>> >
>> >> Consider a 12-node Kafka cluster with a 200-parition topic having a
>> >> replication factor of 3. Let's assume, in addition, that we're running
>> >> Kafka v0.8.2, we've disabled unclean leader election, acks is -1, and
>> >> min.isr is 2.
>> >>
>> >> Now suppose we lose 2 nodes. In this case, there's a good chance that
>> 2/3
>> >> replicas of one or more partitions will be unavailable. This means that
>> >> messages assigned to those partitions will not be writable. If we're
>> >> writing a large number of messages, I would expect that all producers
>> would
>> >> eventually halt. It is somewhat surprising that, if we rely on a basic
>> >> durability setting, the cluster would likely be unavailable even after
>> >> losing only 2 / 12 nodes.
>> >>
>> >> It might be useful in this scenario for the producer to be able to
>> detect
>> >> which partitions are no longer available and reroute messages that would
>> >> have hashed to the unavailable partitions (as defined by our acks and
>> >> min.isr settings). This way, the cluster as a whole would remain
>> available
>> >> for writes at the cost of a slightly higher load on the remaining
>> machines.
>> >>
>> >> Is this limitation accurately described? Is the proposed producer
>> >> functionality worth pursuing?
>> >>
>>

Re: Consistency and Availability on Node Failures

Reply via email to