Re: [DISCUSS] KIP-693: Client-side Circuit Breaker for Partition Write Errors

Guoqiang Shu Mon, 14 Dec 2020 17:42:01 -0800


Hi Jun and Justin,

Many thanks for taking a look at our proposal and for the pointer! We learned 
about the mechanism proposed to enhance StickyPartitioner. Both methods aim to 
exclude brokers with transient errors and prevent cluster wide failure. The 
difference lies in the criteria used to tell if a broker is problematic: our 
KIP uses the error condition for the operation; and the heuristic in 
StickyPartitioner relies on the internal state 
max.in.flight.requests.per.connection.

IMHO, using final result of write operation makes the behavior simpler to 
reason about. It covers all error scenarios and potentially supports all 
implementations of Partitioner. In contrast, using intermediate state may 
trigger action prematurely, for example, when 
max.in.flight.requests.per.connection reaches the threshold (due to small 
linger.ms value)  but the buffer in producer side is still in healthy state. In 
addition, in sequential mode max.in.flight.requests.per.connection is set to 1 
therefore cannot be leveraged. 

Finally, as Justine pointed out, having AvailablePartitions reflects broker 
status (as enabled in our KIP) will benefit optimizations of the Partitioner in 
general, so the two classes of enhancements can coexist.

Cheers,
//George//

On 2020/12/08 18:19:43, Justine Olshan <jols...@confluent.io> wrote: 
> Hi George,
> I've been looking at the discussion on improving the sticky partitioner,
> and one of the potential issues we discussed is how we could get
> information to the partitioner to tell it not to choose certain partitions.
> Currently, the partitioner can only use availablePartitionsForTopic. I took
> a quick look at your KIP and it seemed that your KIP would change what
> partitions are returned with this method. This seems like a step in the
> right direction for solving that issue too.
> 
> I agree with Jun that looking at both of these issues and the proposed
> solutions would be very helpful.
> Justine
> 
> On Tue, Dec 8, 2020 at 10:07 AM Jun Rao <j...@confluent.io> wrote:
> 
> > Hi, George,
> >
> > Thanks for submitting the KIP. There was an earlier discussing on improving
> > the sticky partitioner in the producer (
> >
> > https://lists.apache.org/thread.html/rae8d2d5587dae57ad9093a85181e0cb4256f10d1e57138ecdb3ef287%40%3Cdev.kafka.apache.org%3E
> > ).
> > It seems to be solving a very similar issue. It would be useful to analyze
> > both approaches and see which one solves the problem better.
> >
> > Jun
> >
> > On Tue, Dec 8, 2020 at 8:05 AM georgeshu(舒国强) <george...@tencent.com>
> > wrote:
> >
> > > Hello,
> > >
> > > We write up a KIP based on a straightforward mechanism implemented and
> > > tested in order to solve a practical issue in production.
> > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-693%3A+Client-side+Circuit+Breaker+for+Partition+Write+Errors
> > > Look forward to hearing feedback and suggestions.
> > >
> > > Thanks!
> > >
> > >
> >
>

Re: [DISCUSS] KIP-693: Client-side Circuit Breaker for Partition Write Errors

Reply via email to