Hi Jun and Justin,
Many thanks for taking a look at our proposal and for the pointer! We learned about the mechanism proposed to enhance StickyPartitioner. Both methods aim to exclude brokers with transient errors and prevent cluster wide failure. The difference lies in the criteria used to tell if a broker is problematic: our KIP uses the error condition for the operation; and the heuristic in StickyPartitioner relies on the internal state max.in.flight.requests.per.connection. IMHO, using final result of write operation makes the behavior simpler to reason about. It covers all error scenarios and potentially supports all implementations of Partitioner. In contrast, using intermediate state may trigger action prematurely, for example, when max.in.flight.requests.per.connection reaches the threshold (due to small linger.ms value) but the buffer in producer side is still in healthy state. In addition, in sequential mode max.in.flight.requests.per.connection is set to 1 therefore cannot be leveraged. Finally, as Justine pointed out, having AvailablePartitions reflects broker status (as enabled in our KIP) will benefit optimizations of the Partitioner in general, so the two classes of enhancements can coexist. Cheers, //George// On 2020/12/08 18:19:43, Justine Olshan <jols...@confluent.io> wrote: > Hi George, > I've been looking at the discussion on improving the sticky partitioner, > and one of the potential issues we discussed is how we could get > information to the partitioner to tell it not to choose certain partitions. > Currently, the partitioner can only use availablePartitionsForTopic. I took > a quick look at your KIP and it seemed that your KIP would change what > partitions are returned with this method. This seems like a step in the > right direction for solving that issue too. > > I agree with Jun that looking at both of these issues and the proposed > solutions would be very helpful. > Justine > > On Tue, Dec 8, 2020 at 10:07 AM Jun Rao <j...@confluent.io> wrote: > > > Hi, George, > > > > Thanks for submitting the KIP. There was an earlier discussing on improving > > the sticky partitioner in the producer ( > > > > https://lists.apache.org/thread.html/rae8d2d5587dae57ad9093a85181e0cb4256f10d1e57138ecdb3ef287%40%3Cdev.kafka.apache.org%3E > > ). > > It seems to be solving a very similar issue. It would be useful to analyze > > both approaches and see which one solves the problem better. > > > > Jun > > > > On Tue, Dec 8, 2020 at 8:05 AM georgeshu(舒国强) <george...@tencent.com> > > wrote: > > > > > Hello, > > > > > > We write up a KIP based on a straightforward mechanism implemented and > > > tested in order to solve a practical issue in production. > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-693%3A+Client-side+Circuit+Breaker+for+Partition+Write+Errors > > > Look forward to hearing feedback and suggestions. > > > > > > Thanks! > > > > > > > > >