No, this is all within the same DC. I think the problem has to do with the LB. We've upgraded our producers to point directory to a node for testing and after running it all night, I don't see any more connections then there are supposed to be.
Can I ask which LB are you using? We are using A10's On Sep 26, 2013, at 6:41 PM, Nicolas Berthet <nicolasbert...@maaii.com> wrote: > Hi Mark, > > I'm using centos 6.2. My file limit is something like 500k, the value is > arbitrary. > > One of the thing I changed so far are the TCP keepalive parameters, it had > moderate success so far. > > net.ipv4.tcp_keepalive_time > net.ipv4.tcp_keepalive_intvl > net.ipv4.tcp_keepalive_probes > > I still notice an abnormal number of ESTABLISHED connections, I've been doing > some search and came over this page > (http://www.lognormal.com/blog/2012/09/27/linux-tcpip-tuning/) > > I'll change the "net.netfilter.nf_conntrack_tcp_timeout_established" as > indicated there, it looks closer to the solution to my issue. > > Are you also experiencing the issue in a cross data center context ? > > Best regards, > > Nicolas Berthet > > > -----Original Message----- > From: Mark [mailto:static.void....@gmail.com] > Sent: Friday, September 27, 2013 6:08 AM > To: users@kafka.apache.org > Subject: Re: Too many open files > > What OS settings did you change? How high is your huge file limit? > > > On Sep 25, 2013, at 10:06 PM, Nicolas Berthet <nicolasbert...@maaii.com> > wrote: > >> Jun, >> >> I observed similar kind of things recently. (didn't notice before >> because our file limit is huge) >> >> I have a set of brokers in a datacenter, and producers in different data >> centers. >> >> At some point I got disconnections, from the producer perspective I had >> something like 15 connections to the broker. On the other hand on the broker >> side, I observed hundreds of connections from the producer in an ESTABLISHED >> state. >> >> We had some default settings for the socket timeout on the OS level, which >> we reduced hoping it would prevent the issue in the future. I'm not sure if >> the issue is from the broker or OS configuration though. I'm still keeping >> the broker under observation for the time being. >> >> Note that, for clients in the same datacenter, we didn't see this issue, the >> socket count matches on both ends. >> >> Nicolas Berthet >> >> -----Original Message----- >> From: Jun Rao [mailto:jun...@gmail.com] >> Sent: Thursday, September 26, 2013 12:39 PM >> To: users@kafka.apache.org >> Subject: Re: Too many open files >> >> If a client is gone, the broker should automatically close those broken >> sockets. Are you using a hardware load balancer? >> >> Thanks, >> >> Jun >> >> >> On Wed, Sep 25, 2013 at 4:48 PM, Mark <static.void....@gmail.com> wrote: >> >>> FYI if I kill all producers I don't see the number of open files drop. >>> I still see all the ESTABLISHED connections. >>> >>> Is there a broker setting to automatically kill any inactive TCP >>> connections? >>> >>> >>> On Sep 25, 2013, at 4:30 PM, Mark <static.void....@gmail.com> wrote: >>> >>>> Any other ideas? >>>> >>>> On Sep 25, 2013, at 9:06 AM, Jun Rao <jun...@gmail.com> wrote: >>>> >>>>> We haven't seen any socket leaks with the java producer. If you >>>>> have >>> lots >>>>> of unexplained socket connections in established mode, one possible >>> cause >>>>> is that the client created new producer instances, but didn't close >>>>> the >>> old >>>>> ones. >>>>> >>>>> Thanks, >>>>> >>>>> Jun >>>>> >>>>> >>>>> On Wed, Sep 25, 2013 at 6:08 AM, Mark <static.void....@gmail.com> >>> wrote: >>>>> >>>>>> No. We are using the kafka-rb ruby gem producer. >>>>>> https://github.com/acrosa/kafka-rb >>>>>> >>>>>> Now that you asked that question I need to ask. Is there a problem >>>>>> with the java producer? >>>>>> >>>>>> Sent from my iPhone >>>>>> >>>>>>> On Sep 24, 2013, at 9:01 PM, Jun Rao <jun...@gmail.com> wrote: >>>>>>> >>>>>>> Are you using the java producer client? >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Jun >>>>>>> >>>>>>> >>>>>>>> On Tue, Sep 24, 2013 at 5:33 PM, Mark >>>>>>>> <static.void....@gmail.com> >>>>>> wrote: >>>>>>>> >>>>>>>> Our 0.7.2 Kafka cluster keeps crashing with: >>>>>>>> >>>>>>>> 2013-09-24 17:21:47,513 - [kafka-acceptor:Acceptor@153] - Error >>>>>>>> in acceptor >>>>>>>> java.io.IOException: Too many open >>>>>>>> >>>>>>>> The obvious fix is to bump up the number of open files but I'm >>> wondering >>>>>>>> if there is a leak on the Kafka side and/or our application >>>>>>>> side. We currently have the ulimit set to a generous 4096 but >>>>>>>> obviously we are hitting this ceiling. What's a recommended value? >>>>>>>> >>>>>>>> We are running rails and our Unicorn workers are connecting to >>>>>>>> our >>> Kafka >>>>>>>> cluster via round-robin load balancing. We have about 1500 >>>>>>>> workers to >>>>>> that >>>>>>>> would be 1500 connections right there but they should be split >>>>>>>> across >>>>>> our 3 >>>>>>>> nodes. Instead Netstat shows thousands of connections that look >>>>>>>> like >>>>>> this: >>>>>>>> >>>>>>>> tcp 0 0 kafka1.mycompany.:XmlIpcRegSvc ::ffff: >>>>>> 10.99.99.1:22503 ESTABLISHED >>>>>>>> tcp 0 0 kafka1.mycompany.:XmlIpcRegSvc ::ffff: >>>>>> 10.99.99.1:48398 ESTABLISHED >>>>>>>> tcp 0 0 kafka1.mycompany.:XmlIpcRegSvc ::ffff: >>>>>> 10.99.99.2:29617 ESTABLISHED >>>>>>>> tcp 0 0 kafka1.mycompany.:XmlIpcRegSvc ::ffff: >>>>>> 10.99.99.1:32444 ESTABLISHED >>>>>>>> tcp 0 0 kafka1.mycompany.:XmlIpcRegSvc ::ffff: >>>>>> 10.99.99.1:34415 ESTABLISHED >>>>>>>> tcp 0 0 kafka1.mycompany.:XmlIpcRegSvc ::ffff: >>>>>> 10.99.99.1:56901 ESTABLISHED >>>>>>>> tcp 0 0 kafka1.mycompany.:XmlIpcRegSvc ::ffff: >>>>>> 10.99.99.2:45349 ESTABLISHED >>>>>>>> >>>>>>>> Has anyone come across this problem before? Is this a 0.7.2 >>>>>>>> leak, LB misconfiguration... ? >>>>>>>> >>>>>>>> Thanks >>>>>> >>>> >>> >>> >