[ 
https://issues.apache.org/jira/browse/KAFKA-12793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KahnCheny updated KAFKA-12793:
------------------------------
    Summary: Client-side Circuit Breaker for Partition Write Errors  (was: 
KIP-693 Client-side Circuit Breaker for Partition Write Errors)

> Client-side Circuit Breaker for Partition Write Errors
> ------------------------------------------------------
>
>                 Key: KAFKA-12793
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12793
>             Project: Kafka
>          Issue Type: New Feature
>          Components: clients
>            Reporter: KahnCheny
>            Priority: Major
>
> When Kafka is used to build data pipeline in mission critical business 
> scenarios, availability and throughput are the most important operational 
> goals that need to be maintained in presence of transient or permanent local 
> failure. One typical situation that requires Ops intervention is disk 
> failure, some partitions have long write latency caused by extremely high 
> disk utilization; since all partitions share the same buffer under the 
> current producer thread model, the buffer will be filled up quickly and 
> eventually the good partitions are impacted as well. The cluster level 
> success rate and timeout ratio will degrade until the local infrastructure 
> issue is resolved.
> One way to mitigate this issue is to add client side mechanism to short 
> circuit problematic partitions during transient failure. Similar approach is 
> applied in other distributed systems and RPC frameworks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to