Hi, Do you know if a retry tries to use the same Broker connection or may reinitialize that connection too?
Thanks, - Sini From: "Peter Sinoros Szabo" <peter.sinoros-sz...@hu.ibm.com> To: users@kafka.apache.org Date: 2017/05/25 17:01 Subject: vpn vs TimeoutException Hi, Please help me to understand the following situation and to fix the problem. My servers needs a vpn connection to access the kafka brokers. The vpn connection restarts periodically and in some of these cases, the kafka producer send call's callback return with TimeoutExceptions. I'd like to prevent these exceptions and rely on the kafka client's retry feature to deliver the messages reliably. >From the logs I see that the vpn connection restarts in 4 seconds between May 25 05:30:55 and May 25 05:30:59 while it runs several ip command like "ip route del..., ip add del..., ip link set..., ip addr add..., ip route add..." and after that from 05:31:19.793 I get kafka errors, see logs below. What I do not understand is that the first NETWORK_EXCEPTION comes much after the VPN is operational again, it says that it will retry and after that I got the TimeoutExceptions. From the retry count (119 left) I thought it would keep retrying for a while, backoff is 300ms, so it should take 36 seconds. But the first TimeoutException is 1-7 seconds after the NETWORK_EXCEPTION message. Maybe those two are unrelated. I suppose the TCP connection is gone while the vpn connection restarts, but it seems that the kafka client does not try to reestablish the connection, I do not know if it should, but it would be great. Other clients from other machines are fine, brokers is ok also. So please shed some light on this, mostly if this can be avoided with some simple configuration :), that would be great. The Kafka client related logs: 17/05/25 05:31:19.793 WARN [kafka-producer-network-thread | producer-1] [] o.a.k.c.producer.internals.Sender - Got error produce response with correlation id 34282068 on topic-partition uhs-updates-7, retrying (119 attempts left). Error: NETWORK_EXCEPTION 17/05/25 05:31:19.793 WARN [kafka-producer-network-thread | producer-1] [] o.a.k.c.producer.internals.Sender - Got error produce response with correlation id 34282067 on topic-partition uhs-updates-1, retrying (119 attempts left). Error: NETWORK_EXCEPTION 17/05/25 05:31:19.793 WARN [kafka-producer-network-thread | producer-1] [] o.a.k.c.producer.internals.Sender - Got error produce response with correlation id 34282066 on topic-partition uhs-updates-7, retrying (119 attempts left). Error: NETWORK_EXCEPTION 17/05/25 05:31:19.793 WARN [kafka-producer-network-thread | producer-1] [] o.a.k.c.producer.internals.Sender - Got error produce response with correlation id 34282065 on topic-partition uhs-updates-1, retrying (119 attempts left). Error: NETWORK_EXCEPTION 17/05/25 05:31:19.794 WARN [kafka-producer-network-thread | producer-1] [] o.a.k.c.producer.internals.Sender - Got error produce response with correlation id 34282063 on topic-partition uhs-updates-1, retrying (119 attempts left). Error: NETWORK_EXCEPTION 17/05/25 05:31:19.794 WARN [kafka-producer-network-thread | producer-1] [] o.a.k.c.producer.internals.Sender - Got error produce response with correlation id 34282077 on topic-partition uhs-updates-5, retrying (119 attempts left). Error: NETWORK_EXCEPTION 17/05/25 05:31:19.794 WARN [kafka-producer-network-thread | producer-1] [] o.a.k.c.producer.internals.Sender - Got error produce response with correlation id 34282076 on topic-partition uhs-updates-8, retrying (119 attempts left). Error: NETWORK_EXCEPTION 17/05/25 05:31:19.794 WARN [kafka-producer-network-thread | producer-1] [] o.a.k.c.producer.internals.Sender - Got error produce response with correlation id 34282075 on topic-partition uhs-updates-5, retrying (119 attempts left). Error: NETWORK_EXCEPTION 17/05/25 05:31:19.794 WARN [kafka-producer-network-thread | producer-1] [] o.a.k.c.producer.internals.Sender - Got error produce response with correlation id 34282074 on topic-partition uhs-updates-2, retrying (119 attempts left). Error: NETWORK_EXCEPTION 17/05/25 05:31:19.794 WARN [kafka-producer-network-thread | producer-1] [] o.a.k.c.producer.internals.Sender - Got error produce response with correlation id 34282064 on topic-partition uhs-updates-2, retrying (119 attempts left). Error: NETWORK_EXCEPTION 17/05/25 05:31:25.539 WARN [kafka-producer-network-thread | producer-1] [] o.a.k.c.producer.internals.Sender - Got error produce response with correlation id 34282073 on topic-partition uhs-updates-9, retrying (119 attempts left). Error: NETWORK_EXCEPTION 17/05/25 05:31:25.539 WARN [kafka-producer-network-thread | producer-1] [] o.a.k.c.producer.internals.Sender - Got error produce response with correlation id 34282072 on topic-partition uhs-updates-6, retrying (119 attempts left). Error: NETWORK_EXCEPTION 17/05/25 05:31:25.539 WARN [kafka-producer-network-thread | producer-1] [] o.a.k.c.producer.internals.Sender - Got error produce response with correlation id 34282071 on topic-partition uhs-updates-9, retrying (119 attempts left). Error: NETWORK_EXCEPTION 17/05/25 05:31:25.539 WARN [kafka-producer-network-thread | producer-1] [] o.a.k.c.producer.internals.Sender - Got error produce response with correlation id 34282070 on topic-partition uhs-updates-6, retrying (119 attempts left). Error: NETWORK_EXCEPTION 17/05/25 05:31:25.539 WARN [kafka-producer-network-thread | producer-1] [] o.a.k.c.producer.internals.Sender - Got error produce response with correlation id 34282069 on topic-partition uhs-updates-3, retrying (119 attempts left). Error: NETWORK_EXCEPTION 17/05/25 05:31:26.407 ERROR [kafka-producer-network-thread | producer-1] [] t.ustream.kafka.client.AsyncProducer - uhs-updates:[22789631] org.apache.kafka.common.errors.TimeoutException org.apache.kafka.common.errors.TimeoutException: Expiring 51 record(s) for uhs-updates-0 due to 30003 ms has passed since batch creation plus linger time This last line repeats about 200 times, with record numbers varying between 48 and 51, partition numbers are [0,3,6,9] and the ms value is somewhat larger then 30000. 22789631 is the message key, uhs-updates is the topic name. For the full list see: https://pastebin.com/raw/4Yz1yTJU Thanks, - Sini