Hi David and Alexandre, Thanks so much for your feedback! Here are my answers:
1. Yes, we have seen several cases of clients that create a new connection per produce/consume request. One hypothesis is someone who is used to connection pooling may accidentally write a Kafka client that creates a new connection every time. 2 & 4. That's a good point I haven't considered. I think it makes sense to provide an ability to limit connection creations per IP as well. This is not hard to implement -- the broker already keeps track of the number of connections per IP, and immediately closes a new connection if it comes from an IP that reached the connection limit. So, we could additionally track the rate, and close the connection from IP that exceeds the rate. One slight concern is whether keeping track of per IP rates and quotas adds overhead (CPU and memory). But perhaps it is not a problem if we use expiring sensors. It would still make sense to limit the overall connection creation rate for the Kafka clusters which are shared among many different applications/clients, since they may spike at the same time bringing the total rate too high. 3. Controlling connection queue sizes only controls the share of time network threads use for creating new connections (and accepting on Acceptor thread) vs. doing other work on each Processor iteration. It does not directly control how processing connection creations would be related to other processing done by brokers like on request handler threads. So, while controlling queue size may mitigate the issue for some of the workloads, it does not guarantee that. Plus, if we want to limit how many connections are created per IP, the queue size approach would not work, unless we go with a "share" of the queue, which I think even further obscures what that setting means (and what we would achieve as an end result). Does this answer the question? If there are no objections, I will update the KIP to add per IP connection rate limits (config and enforcement). Thanks, Anna On Tue, May 12, 2020 at 11:25 AM Alexandre Dupriez < alexandre.dupr...@gmail.com> wrote: > Hello, > > Thank you for the KIP. > > I experienced in the past genuine broker brownouts due to connection > storms consuming most of the CPU available on the server and I think > it is useful to protect against it. > > I tend to share the questions asked in points 2 and 4 from David. Is > there still a risk of denial of service if the limit applies at the > listener-level without differentiating between (an) “offending” > client(s) and the others? > > To rebound on point 3 - conceptually one difference between capping > the queue size or throttling as presented in the KIP would come from > the time it takes to accept a connection and how that time evolves > with the connection rate. > Assuming that that time increases monotonically with resource > utilization, the admissible rate of connections would decrease as the > server becomes more loaded, if the limit was set on queue size. > > Thanks, > Alexandre > > Le mar. 12 mai 2020 à 08:49, David Jacot <dja...@confluent.io> a écrit : > > > > Hi Anna, > > > > Thanks for the KIP! I have few questions: > > > > 1. You mention that some clients may create a new connections for each > > requests: "Another example is clients that create a new connection for > each > > produce/consume request". I am curious here but do we know any clients > > behaving like this? > > > > 2. I am a bit concerned by the impact of misbehaving clients on the other > > ones. Let's say that we define a quota of 10 connections / sec for a > broker > > and that we have a misbehaving application constantly trying to create 20 > > connections on that broker. That application will constantly hit the > quota > > and > > always have many pending connections in the queue waiting to be accepted. > > Regular clients trying to connect would need to wait until all the > pending > > connections upfront in the queue are drained in the best case scenario or > > won't be able to connect at all in the worst case scenario if the queue > is > > full. > > Does it sound like a valid concern? How do you see this? > > > > 3. As you mention it in the KIP, we use bounded queues which already > limit > > the maximum number of connections that can be accepted. I wonder if we > > could reach the same goal by making the size of the queue configurable. > > > > 4. Did you consider doing something similar to the connections quota > which > > limits the number of connections per IP? Instead of rate limiting all the > > creation, > > we could perhaps rate limit the number of creation per IP as well. That > > could > > perhaps reduce the effect on the other clients. That may be harder to > > implement > > though. > > > > Best, > > David > > > > On Mon, May 11, 2020 at 7:58 PM Anna Povzner <a...@confluent.io> wrote: > > > > > Hi, > > > > > > I just created KIP-612 to allow limiting connection creation rate on > > > brokers, and would like to start a discussion. > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-612%3A+Ability+to+Limit+Connection+Creation+Rate+on+Brokers > > > > > > Feedback and suggestions are welcome! > > > > > > Thanks, > > > Anna > > > >