Wouldn't this work only for producers using random partitioning? On Tue, Oct 14, 2014 at 1:51 PM, Kyle Banker <kyleban...@gmail.com> wrote:
> Consider a 12-node Kafka cluster with a 200-parition topic having a > replication factor of 3. Let's assume, in addition, that we're running > Kafka v0.8.2, we've disabled unclean leader election, acks is -1, and > min.isr is 2. > > Now suppose we lose 2 nodes. In this case, there's a good chance that 2/3 > replicas of one or more partitions will be unavailable. This means that > messages assigned to those partitions will not be writable. If we're > writing a large number of messages, I would expect that all producers would > eventually halt. It is somewhat surprising that, if we rely on a basic > durability setting, the cluster would likely be unavailable even after > losing only 2 / 12 nodes. > > It might be useful in this scenario for the producer to be able to detect > which partitions are no longer available and reroute messages that would > have hashed to the unavailable partitions (as defined by our acks and > min.isr settings). This way, the cluster as a whole would remain available > for writes at the cost of a slightly higher load on the remaining machines. > > Is this limitation accurately described? Is the proposed producer > functionality worth pursuing? >