I didn't realize that anyone used partitions to logically divide a topic. When would that be preferable to simply having a separate topic? Isn't this a minority case?
On Thu, Oct 16, 2014 at 7:28 AM, Gwen Shapira <gshap...@cloudera.com> wrote: > Just note that this is not a universal solution. Many use-cases care > about which partition you end up writing to since partitions are used > to... well, partition logical entities such as customers and users. > > > > On Wed, Oct 15, 2014 at 9:03 PM, Jun Rao <jun...@gmail.com> wrote: > > Kyle, > > > > What you wanted is not supported out of box. You can achieve this using > the > > new java producer. The new java producer allows you to pick an arbitrary > > partition when sending a message. If you receive > NotEnoughReplicasException > > when sending a message, you can resend it to another partition. > > > > Thanks, > > > > Jun > > > > On Tue, Oct 14, 2014 at 1:51 PM, Kyle Banker <kyleban...@gmail.com> > wrote: > > > >> Consider a 12-node Kafka cluster with a 200-parition topic having a > >> replication factor of 3. Let's assume, in addition, that we're running > >> Kafka v0.8.2, we've disabled unclean leader election, acks is -1, and > >> min.isr is 2. > >> > >> Now suppose we lose 2 nodes. In this case, there's a good chance that > 2/3 > >> replicas of one or more partitions will be unavailable. This means that > >> messages assigned to those partitions will not be writable. If we're > >> writing a large number of messages, I would expect that all producers > would > >> eventually halt. It is somewhat surprising that, if we rely on a basic > >> durability setting, the cluster would likely be unavailable even after > >> losing only 2 / 12 nodes. > >> > >> It might be useful in this scenario for the producer to be able to > detect > >> which partitions are no longer available and reroute messages that would > >> have hashed to the unavailable partitions (as defined by our acks and > >> min.isr settings). This way, the cluster as a whole would remain > available > >> for writes at the cost of a slightly higher load on the remaining > machines. > >> > >> Is this limitation accurately described? Is the proposed producer > >> functionality worth pursuing? > >> >