Hello,

Is it correct that producers do not fail new connection establishment when
it exceeds the request timeout?

Running on AWS, we've encountered a problem where certain very low volume
producers end up with metadata that's sufficiently stale that they attempt
to establish a connection to a broker instance that has already been
terminated as part of a maintenance operation. I would expect this to fail
and be retried normally, but it appears to hang until the system-level TCP
connection timeout is reached (2-3 minutes), with the writes themselves
being expired before even a single attempt is made to send them.

We've worked around the issue by setting `metadata.max.age.ms` extremely
low, such that these producers are requesting new metadata much faster than
our maintenance operations are terminating instances. While this does work,
it seems like an unfortunate workaround for some very surprising behavior.

Thanks,
Luke

Reply via email to