Hi,

Do you know if a retry tries to use the same Broker connection or may 
reinitialize that connection too?

Thanks,
- Sini



From:   "Peter Sinoros Szabo" <peter.sinoros-sz...@hu.ibm.com>
To:     users@kafka.apache.org
Date:   2017/05/25 17:01
Subject:        vpn vs TimeoutException



Hi,

Please help me to understand the following situation and to fix the 
problem.
My servers needs a vpn connection to access the kafka brokers. The vpn 
connection restarts periodically and in some of these cases, the kafka 
producer send call's callback return with TimeoutExceptions. I'd like to 
prevent these exceptions and rely on the kafka client's retry feature to 
deliver the messages reliably.

>From the logs I see that the vpn connection restarts in 4 seconds between 
May 25 05:30:55 and May 25 05:30:59 while it runs several ip command like 
"ip route del..., ip add del..., ip link set..., ip addr add..., ip route 
add..."  and after that from 05:31:19.793 I get kafka errors, see logs 
below. 

What I do not understand is that the first NETWORK_EXCEPTION comes much 
after the VPN is operational again, it says that it will retry and after 
that I got the TimeoutExceptions. From the retry count (119 left) I 
thought it would keep retrying for a while, backoff is 300ms, so it should 

take 36 seconds. But the first TimeoutException is 1-7 seconds after the 
NETWORK_EXCEPTION message. Maybe those two are unrelated. I suppose the 
TCP connection is gone while the vpn connection restarts, but it seems 
that the kafka client does not try to reestablish the connection, I do not 

know if it should, but it would be great. Other clients from other 
machines are fine, brokers is ok also.

So please shed some light on this, mostly if this can be avoided with some 

simple configuration :), that would be great.

The Kafka client related logs:
17/05/25 05:31:19.793 WARN  [kafka-producer-network-thread | producer-1] 
[] o.a.k.c.producer.internals.Sender - Got error produce response with 
correlation id 34282068 on topic-partition uhs-updates-7, retrying (119 
attempts left). Error: NETWORK_EXCEPTION
17/05/25 05:31:19.793 WARN  [kafka-producer-network-thread | producer-1] 
[] o.a.k.c.producer.internals.Sender - Got error produce response with 
correlation id 34282067 on topic-partition uhs-updates-1, retrying (119 
attempts left). Error: NETWORK_EXCEPTION
17/05/25 05:31:19.793 WARN  [kafka-producer-network-thread | producer-1] 
[] o.a.k.c.producer.internals.Sender - Got error produce response with 
correlation id 34282066 on topic-partition uhs-updates-7, retrying (119 
attempts left). Error: NETWORK_EXCEPTION
17/05/25 05:31:19.793 WARN  [kafka-producer-network-thread | producer-1] 
[] o.a.k.c.producer.internals.Sender - Got error produce response with 
correlation id 34282065 on topic-partition uhs-updates-1, retrying (119 
attempts left). Error: NETWORK_EXCEPTION
17/05/25 05:31:19.794 WARN  [kafka-producer-network-thread | producer-1] 
[] o.a.k.c.producer.internals.Sender - Got error produce response with 
correlation id 34282063 on topic-partition uhs-updates-1, retrying (119 
attempts left). Error: NETWORK_EXCEPTION
17/05/25 05:31:19.794 WARN  [kafka-producer-network-thread | producer-1] 
[] o.a.k.c.producer.internals.Sender - Got error produce response with 
correlation id 34282077 on topic-partition uhs-updates-5, retrying (119 
attempts left). Error: NETWORK_EXCEPTION
17/05/25 05:31:19.794 WARN  [kafka-producer-network-thread | producer-1] 
[] o.a.k.c.producer.internals.Sender - Got error produce response with 
correlation id 34282076 on topic-partition uhs-updates-8, retrying (119 
attempts left). Error: NETWORK_EXCEPTION
17/05/25 05:31:19.794 WARN  [kafka-producer-network-thread | producer-1] 
[] o.a.k.c.producer.internals.Sender - Got error produce response with 
correlation id 34282075 on topic-partition uhs-updates-5, retrying (119 
attempts left). Error: NETWORK_EXCEPTION
17/05/25 05:31:19.794 WARN  [kafka-producer-network-thread | producer-1] 
[] o.a.k.c.producer.internals.Sender - Got error produce response with 
correlation id 34282074 on topic-partition uhs-updates-2, retrying (119 
attempts left). Error: NETWORK_EXCEPTION
17/05/25 05:31:19.794 WARN  [kafka-producer-network-thread | producer-1] 
[] o.a.k.c.producer.internals.Sender - Got error produce response with 
correlation id 34282064 on topic-partition uhs-updates-2, retrying (119 
attempts left). Error: NETWORK_EXCEPTION
17/05/25 05:31:25.539 WARN  [kafka-producer-network-thread | producer-1] 
[] o.a.k.c.producer.internals.Sender - Got error produce response with 
correlation id 34282073 on topic-partition uhs-updates-9, retrying (119 
attempts left). Error: NETWORK_EXCEPTION
17/05/25 05:31:25.539 WARN  [kafka-producer-network-thread | producer-1] 
[] o.a.k.c.producer.internals.Sender - Got error produce response with 
correlation id 34282072 on topic-partition uhs-updates-6, retrying (119 
attempts left). Error: NETWORK_EXCEPTION
17/05/25 05:31:25.539 WARN  [kafka-producer-network-thread | producer-1] 
[] o.a.k.c.producer.internals.Sender - Got error produce response with 
correlation id 34282071 on topic-partition uhs-updates-9, retrying (119 
attempts left). Error: NETWORK_EXCEPTION
17/05/25 05:31:25.539 WARN  [kafka-producer-network-thread | producer-1] 
[] o.a.k.c.producer.internals.Sender - Got error produce response with 
correlation id 34282070 on topic-partition uhs-updates-6, retrying (119 
attempts left). Error: NETWORK_EXCEPTION
17/05/25 05:31:25.539 WARN  [kafka-producer-network-thread | producer-1] 
[] o.a.k.c.producer.internals.Sender - Got error produce response with 
correlation id 34282069 on topic-partition uhs-updates-3, retrying (119 
attempts left). Error: NETWORK_EXCEPTION
17/05/25 05:31:26.407 ERROR [kafka-producer-network-thread | producer-1] 
[] t.ustream.kafka.client.AsyncProducer - uhs-updates:[22789631] 
org.apache.kafka.common.errors.TimeoutException
org.apache.kafka.common.errors.TimeoutException: Expiring 51 record(s) for 

uhs-updates-0 due to 30003 ms has passed since batch creation plus linger 
time

This last line repeats about 200 times, with record numbers varying 
between 48 and 51, partition numbers are [0,3,6,9] and the ms value is 
somewhat larger then 30000. 22789631 is the message key, uhs-updates is 
the topic name. For the full list see: https://pastebin.com/raw/4Yz1yTJU

Thanks,
- Sini





Reply via email to