Hi Yash, Thanks for the KIP. It's a nice, focused change. Initially I was hesitant to support cases where connector validation takes this long, but considering the alternative is that we give users a 500 error response but leave the request to create/modify the connector queued up in the herder, I think I can get behind the motivation here. There's also an argument to be made about keeping Kafka Connect available even when the systems that it connects to are in a degraded state.
I have a few alternatives I'd be interested in your thoughts on: 1. Since the primary concern here seems to be that custom connector validation logic can take too long, do we have any thoughts on adding logic to check for request timeout after validation has completed and, if it has, aborting the attempt to create/modify the connector? 2. Right now it's possible that we'll perform two connector config validations per create/modify request; once on the worker that initially receives the request, and then again if that worker is not the leader of the cluster and has to forward the request to the leader. Any thoughts on optimizing this to only require a single validation per request? We probably wouldn't want to force all validations to take place on the leader (could lead to overloading it pretty quickly in large clusters), but we could add an internal-only query parameter to skip validation and then use that parameter when forwarding requests from followers to the leader. 3. A worker property is pretty coarse-grained, and difficult to change. We might allow per-request toggling of the timeout by adding a URL query parameter like '?timeout=90s' to the REST API to allow tweaking of the timeout on a more granular basis, and without having to perform a worker restart. I'd also like to clarify a point about the rejected alternative "Allow configuring producer zombie fencing admin request timeout"--is the implication here that the "rest.api.request.timeout.ms" property will not control the REST timeout for requests to the 'PUT /connectors/{connector}/fence' endpoint, or just that it won't control the timeout that we use for the call to Admin::fenceProducers? Cheers, Chris On Wed, Nov 2, 2022 at 10:07 AM Yash Mayya <yash.ma...@gmail.com> wrote: > Hi all, > > I'd like to start a discussion thread on this small KIP - > > https://eu01.z.antigena.com/l/8mqVuSEhdZF05LKGk7J1Cb6esUWht6ps4-73ocd196fgyxC~lGhTLqBEt9BPJy1MBx6RzKVi7qRYYpJwCHRixlmdcayj7YMv~-0fIZJ2f8EjEhp-Qck3~O3Ga1JBTgaY4tTAlwh0VjF6217iBaGVG2PNa-2TO5z8zC5s0Nb28kcG2mUg1tcEfzsAtmWEXWRECoeD4cdUXyPvWeFtYL-ccVv-PmQydgSIYkdeR7~XItAXezft9Mfs5pzjejXSK~MhWW > > It proposes the addition of a new Kafka Connect worker configuration to > allow configuring REST API request timeouts. > > Thanks, > Yash >