Hmm, this sounds like a serious bug. I do remember we have some ticket reporting similar issues before but I cannot find it now. Let me dig a bit deeper later.
BTW, could you try out the 0.8.2 broker version and see if this is still easily re-producible, i.e. starting a bunch of producers to send data for a while, and terminate them? Guozhang On Tue, Mar 10, 2015 at 1:00 PM, Allen Wang <aw...@netflix.com.invalid> wrote: > Hello, > > We are using Kafka 0.8.1.1 on the broker and 0.8.2 producer on the client. > After running for a few days, we have found that there are way too many > open file descriptors on the broker side. When we compare the connections > on the client side, we found some connections are already gone on the > client but still exists on the broker. Also there are connections on the > broker where the producer instances are already terminated. > > We then did a netstat -o and found that the connections on the broker side > does not have keep-alive enabled (as timewait is "off"): > > tcp6 0 0 kafka-xyz:7101 ip-a-b-c-d:33471 ESTABLISHED off > (0.00/0/0) > > We suspect that because there is no keep-alive on the broker, there is no > probing on the idle connections and therefore no connection clean up. > > There is a default 2 hours TCP keep alive set on the OS level on both > sides: > > net.ipv4.tcp_keepalive_time = 7200 > > On the producer side, keepalive is enabled on the connection: > > tcp6 0 0 ip-a-b-c-d:33471 kafka-xyz.:7101 ESTABLISHED > keepalive (975.50/0/0) > > Is there anyway to clean up the idle producer connections on the broker > side? Does keepalive helps cleaning up the idle connections? > > Thanks, > Allen > -- -- Guozhang