Hi, We are using the Graphite reporter from Flink 1.2.0 to send the metrics via TCP. Due to our network configuration we cannot use UDP at the moment.
We have observed that if there is any problem with graphite our the network, basically, the TCP connection times out or something, the metrics reporter does not recover. This is easy to reproduce by blocking the port we are sending the metrics using iptables. If we block the port for more than a minute or so, the problem will happen. After the port is re-open, Flink does not continue like before. Is this a known issue? Googling shows some problems with the metrics-graphite package that should have been solved already. We have trying updated metrics-core/graphite to the latest with no success. Any ideas? Thanks! Bruno