I wrote a simplified test program that creates 10 producers and sends a few messages each and then becomes idle. For both 0.8.1.1 and 0.8.2.0, the connections on brokers are gone once the producer instance is terminated.
In prod environment where there are many many more producer instances, we have seen that file descriptor going up to the limit. It does go down sometimes also but the decreasing amount is less than the increasing amount, which results in eventually hitting the limit. Analyzing the source IP addresses on the broker side shows IP addresses that are no longer used. We will try 0.8.2.1 some time later. Thanks, Allen On Wed, Mar 11, 2015 at 10:05 AM, Guozhang Wang <wangg...@gmail.com> wrote: > Hmm, this sounds like a serious bug. I do remember we have some ticket > reporting similar issues before but I cannot find it now. Let me dig a bit > deeper later. > > BTW, could you try out the 0.8.2 broker version and see if this is still > easily re-producible, i.e. starting a bunch of producers to send data for a > while, and terminate them? > > Guozhang > > On Tue, Mar 10, 2015 at 1:00 PM, Allen Wang <aw...@netflix.com.invalid> > wrote: > > > Hello, > > > > We are using Kafka 0.8.1.1 on the broker and 0.8.2 producer on the > client. > > After running for a few days, we have found that there are way too many > > open file descriptors on the broker side. When we compare the connections > > on the client side, we found some connections are already gone on the > > client but still exists on the broker. Also there are connections on the > > broker where the producer instances are already terminated. > > > > We then did a netstat -o and found that the connections on the broker > side > > does not have keep-alive enabled (as timewait is "off"): > > > > tcp6 0 0 kafka-xyz:7101 ip-a-b-c-d:33471 ESTABLISHED off > > (0.00/0/0) > > > > We suspect that because there is no keep-alive on the broker, there is no > > probing on the idle connections and therefore no connection clean up. > > > > There is a default 2 hours TCP keep alive set on the OS level on both > > sides: > > > > net.ipv4.tcp_keepalive_time = 7200 > > > > On the producer side, keepalive is enabled on the connection: > > > > tcp6 0 0 ip-a-b-c-d:33471 kafka-xyz.:7101 ESTABLISHED > > keepalive (975.50/0/0) > > > > Is there anyway to clean up the idle producer connections on the broker > > side? Does keepalive helps cleaning up the idle connections? > > > > Thanks, > > Allen > > > > > > -- > -- Guozhang >