[ https://issues.apache.org/jira/browse/KAFKA-12793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
KahnCheny updated KAFKA-12793: ------------------------------ Summary: Client-side Circuit Breaker for Partition Write Errors (was: KIP-693 Client-side Circuit Breaker for Partition Write Errors) > Client-side Circuit Breaker for Partition Write Errors > ------------------------------------------------------ > > Key: KAFKA-12793 > URL: https://issues.apache.org/jira/browse/KAFKA-12793 > Project: Kafka > Issue Type: New Feature > Components: clients > Reporter: KahnCheny > Priority: Major > > When Kafka is used to build data pipeline in mission critical business > scenarios, availability and throughput are the most important operational > goals that need to be maintained in presence of transient or permanent local > failure. One typical situation that requires Ops intervention is disk > failure, some partitions have long write latency caused by extremely high > disk utilization; since all partitions share the same buffer under the > current producer thread model, the buffer will be filled up quickly and > eventually the good partitions are impacted as well. The cluster level > success rate and timeout ratio will degrade until the local infrastructure > issue is resolved. > One way to mitigate this issue is to add client side mechanism to short > circuit problematic partitions during transient failure. Similar approach is > applied in other distributed systems and RPC frameworks. -- This message was sent by Atlassian Jira (v8.3.4#803005)