Jiangjie, Great start. I have a couple of comments.
Under the motivation section, is it really true that the request will never be completed? Presumably if the broker goes down the connection will be severed, at worst by a TCP timeout, which should clean up the connection and any outstanding requests, right? I think the real reason we need a different timeout is that the default TCP timeouts are ridiculously long in this context. My second question is about whether this is the right level to tackle the issue/what user-facing changes need to be made. A related problem came up in https://issues.apache.org/jira/browse/KAFKA-1788 where producer records get stuck indefinitely because there's no client-side timeout. This KIP wouldn't fix that problem or any problems caused by lack of connectivity since this would only apply to in flight requests, which by definition must have been sent on an active connection. I suspect both types of problems probably need to be addressed separately by introducing explicit timeouts. However, because the settings introduced here are very much about the internal implementations of the clients, I'm wondering if this even needs to be a user-facing setting, especially if we have to add other timeouts anyway. For example, would a fixed, generous value that's still much shorter than a TCP timeout, say 15s, be good enough? If other timeouts would allow, for example, the clients to properly exit even if requests have not hit their timeout, then what's the benefit of being able to configure the request-level timeout? I know we have a similar setting, max.in.flights.requests.per.connection, exposed publicly (which I just discovered is missing from the new producer configs documentation). But it looks like the new consumer is not exposing that option, using a fixed value instead. I think we should default to hiding these implementation values unless there's a strong case for a scenario that requires customization. In other words, since the only user-facing change was the addition of the setting, I'm wondering if we can avoid the KIP altogether by just choosing a good default value for the timeout. -Ewen On Mon, Apr 13, 2015 at 2:35 PM, Jiangjie Qin <j...@linkedin.com.invalid> wrote: > Hi, > > I just created a KIP to add a request timeout to NetworkClient for new > Kafka clients. > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-19+-+Add+a+request+timeout+to+NetworkClient > > Comments and suggestions are welcome! > > Thanks. > > Jiangjie (Becket) Qin > > -- Thanks, Ewen