Sticky Partitioner

Eevee Mon, 30 Nov 2020 07:10:14 -0800

Hi all,

I've noticed a couple edge cases in the Sticky Partitioner and I'd liketo discuss introducing a new KIP to fix it.


Behavior
1. Low throughput producers

The first edge case occurs when a broker becomes temporarily unavailablefor a period less then replica.lag.time.max.ms. If you have a lowthroughput producer generating records without a key and using a smallvalue of linger.ms you will quickly hit themax.in.flight.requests.per.connection limit for that broker or anotherbroker which depends on the unavailable broker to achieve acks=all.At this point, all records will be redirected to whichever broker hitsmax.in.flight.requests.per.connection first and if the producer has lowenough throughput compared to batch.size this will result in no recordsbeing sent to any broker until the failing broker becomes availableagain. Effectively this transforms a short broker failure into a clusterfailure. Ideally, we'd rather see all records redirected away from thesebrokers rather then too them. 2. Overwhelmed brokers The second edgecase occurs when an individual broker begins under performing and cannotkeep up with the producers. Once the broker hitsmax.in.flight.requests.per.connection the producer will begin toredirecting all records without keys to the broker. This results in adisproportionate percentage of the cluster load going to the failingbroker and begins a death spiral in which the broker becomes more andmore overwhelmed resulting in the producers redirecting more and more ofthe clusters load towards it.Proposed Changes We need a solution whichfixes the interaction between the back pressure mechanismmax.in.flight.requests.per.connection and the sticky partitioner.

My current thought is we should remove partitions associated withbrokers which have hit max.in.flight.requests.per.connection from theavailable choices for the sticky partitioners. Once they are belowmax.in.flight.requests.per.connection they'd then be added back into theavailable partition list.

My one concern is that this could cause further edge case behavior forproducers with small values of linger.ms. In particular I could see ascenario in which the producer hitsmax.in.flight.requests.per.connection for all brokers and then blocks onsend() until a request returns rather then building up a new batch. It'spossible (I'd need to investigate the send loop further) the producercould create a new batch as soon as a request arrives, add a singlerecord to it and immediately send it then block on send() again. Thiswould result in the producer doing near to no batching and limiting it'sthroughput drastically.

If this is the case, I figure we can allow the sticky partitioner to useall partitions if all brokers are atmax.in.flight.requests.per.connection. In such a case it would addrecords to a single partition until a request completed or it hitbatch.size and then picked a new partition at random.


Feedback

Before writing a KIP I'd love to hear peoples feedback, alternatives andconcerns.


Regards,
Evelyn.

Sticky Partitioner

Reply via email to