Re: [DISCUSS] KIP-693: Client-side Circuit Breaker for Partition Write Errors

2021-04-17 Thread Guoqiang Shu
Thanks a lot for the comments, Jun! Indeed this is a practical solution originated from the field and we really appreciate the guidance to make it more general. Please refer to the embedded response to the specific questions below. On 2021/04/07 17:59:59, Jun Rao wrote: > Hi, George, > > A f

Re: [DISCUSS] KIP-693: Client-side Circuit Breaker for Partition Write Errors

2021-04-12 Thread Guozhang Wang
Hello Guoqiang, This is another interesting ticket that may be also related to the issues you observed and fixed in your production, if you used sticky partitioner in producer clients: https://issues.apache.org/jira/browse/KAFKA-10888 Guozhang On Wed, Apr 7, 2021 at 11:00 AM Jun Rao wrote:

Re: [DISCUSS] KIP-693: Client-side Circuit Breaker for Partition Write Errors

2021-04-07 Thread Jun Rao
Hi, George, A few more comments on the KIP. 1. It would be useful to motivate the problem a bit more. For example, is the KIP trying to solve a transient broker problem (if so, for how long) or a permanent broker problem? It would also be useful to list some common causes that can slow the broker

Re: [DISCUSS] KIP-693: Client-side Circuit Breaker for Partition Write Errors

2021-03-24 Thread Guoqiang Shu
In our current proposal it can be configured via producer.circuit.breaker.mute.retry.interval (defaulted to 10 mins), but perhaps 'interval' is a confusing name. On 2021/03/23 00:45:23, Guozhang Wang wrote: > Thanks for the updated KIP! Some more comments inlined. > > > > I'm still not sure

Re: [DISCUSS] KIP-693: Client-side Circuit Breaker for Partition Write Errors

2021-03-22 Thread Guozhang Wang
Thanks for the updated KIP! Some more comments inlined. On Sun, Mar 7, 2021 at 6:43 PM Guoqiang Shu wrote: > > > Guozhang, many thanks for taking a look! Sorry for the late reply, we have > iterated the prototype on our production setup with your question in mind > and updated the KIP correspond

Re: [DISCUSS] KIP-693: Client-side Circuit Breaker for Partition Write Errors

2021-03-07 Thread Guoqiang Shu
Guozhang, many thanks for taking a look! Sorry for the late reply, we have iterated the prototype on our production setup with your question in mind and updated the KIP correspondingly. Below are the answers and the summary of the updates > > 1) Why does the default implementation have a c

Re: [DISCUSS] KIP-693: Client-side Circuit Breaker for Partition Write Errors

2020-12-14 Thread Guozhang Wang
Hello George, Thanks for the KIP. Just a few questions to help us understanding the design details: 1) Why does the default implementation have a criterion of when it is enabled -- i.e. after a certain number of messages have been successfully sent -- instead of always enabled? Also how would thi

Re: [DISCUSS] KIP-693: Client-side Circuit Breaker for Partition Write Errors

2020-12-14 Thread Guoqiang Shu
Hi Jun and Justin, Many thanks for taking a look at our proposal and for the pointer! We learned about the mechanism proposed to enhance StickyPartitioner. Both methods aim to exclude brokers with transient errors and prevent cluster wide failure. The difference lies in the criteria used to t

Re: [DISCUSS] KIP-693: Client-side Circuit Breaker for Partition Write Errors

2020-12-08 Thread Justine Olshan
Hi George, I've been looking at the discussion on improving the sticky partitioner, and one of the potential issues we discussed is how we could get information to the partitioner to tell it not to choose certain partitions. Currently, the partitioner can only use availablePartitionsForTopic. I too

Re: [DISCUSS] KIP-693: Client-side Circuit Breaker for Partition Write Errors

2020-12-08 Thread Jun Rao
Hi, George, Thanks for submitting the KIP. There was an earlier discussing on improving the sticky partitioner in the producer ( https://lists.apache.org/thread.html/rae8d2d5587dae57ad9093a85181e0cb4256f10d1e57138ecdb3ef287%40%3Cdev.kafka.apache.org%3E). It seems to be solving a very similar issue

[DISCUSS] KIP-693: Client-side Circuit Breaker for Partition Write Errors

2020-12-08 Thread 舒国强
Hello, We write up a KIP based on a straightforward mechanism implemented and tested in order to solve a practical issue in production. https://cwiki.apache.org/confluence/display/KAFKA/KIP-693%3A+Client-side+Circuit+Breaker+for+Partition+Write+Errors Look forward to hearing feedback and suggesti