Java / OS info: ---------- java.specification.version = 1.8 java.vendor = Oracle Corporation java.version = 1.8.0_45 Oracle Linux Server release 6.7 kernel version 2.6.32-573.18.1.el6.x86_64
Redacted LSOF --------------------- ~46K Close Waits ------------------ java 4692 kafka 2618u IPv6 264581081 0t0 TCP XX-XXXX-kafka01:XmlIpcRegSvc->XX-XXXX-host1:33089 (CLOSE_WAIT) java 4692 kafka 2619u IPv6 264581082 0t0 TCP XX-XXXX-kafka01:XmlIpcRegSvc->XX-XXXX-host2:37371 (CLOSE_WAIT) java 4692 kafka 2621u IPv6 264600187 0t0 TCP XX-XXXX-kafka01:XmlIpcRegSvc->XX-XXXX-host3:40788 (CLOSE_WAIT) 475 Established connections ---------------------------- java 4692 kafka *427u IPv6 282382725 0t0 TCP XX-XXXX-kafka01:54099->XX-XXXX-host1:eforward (ESTABLISHED) java 4692 kafka *639u IPv6 282426735 0t0 TCP XX-XXXX-kafka01:36157->XX-XXXX-kafka01:59964 (ESTABLISHED) java 4692 kafka *860u IPv6 282480072 0t0 TCP XX-XXXX-kafka01:XmlIpcRegSvc->XX-XXXX-host2:50547 (ESTABLISHED) java 4692 kafka *507u IPv6 282481853 0t0 TCP XX-XXXX-kafka01:XmlIpcRegSvc->XX-XXXX-host3:45096 (ESTABLISHED) ~3K ---------------------------- java 4692 kafka 2367u REG 253,3 104857335 141033710 /XXX/kafka/LOG/__consumer_offsets-10/00000000000035177234.log ~1.5K ---------------------------- java 4692 kafka mem REG 253,3 10485760 141297356 /XXX/kafka/LOG/TOPIC-1-9/00000000000000028243.index ~1.5K ---------------------------- java 4692 kafka 818u REG 253,3 2548089 141297556 /XXX/kafka/LOG/TOPIC-1-2-76/00000000000000146894.log java 4692 kafka 819u REG 253,3 0 141165545 /XXX/kafka/LOG/TOPIC-2-2-11/00000000000000000000.log On Fri, Aug 26, 2016 at 6:37 AM, Jaikiran Pai <jai.forums2...@gmail.com> wrote: > Which Java vendor and version are you using in runtime? Also what OS is > this? Can you get the lsof output (on Linux) and paste the output of that > to some place (like gist) to show us what descriptors are open etc... > > -Jaikiran > > > On Friday 26 August 2016 02:49 AM, Bharath Srinivasan wrote: > >> Hello: >> >> We are running a data pipeline application stack using Kafka 0.8.2.2 in >> production. We have been seeing intermittent CLOSE_WAIT on our kafka >> brokers frequently and they fill up the file handles pretty quickly. By >> the >> time the open file count reaches around 40K, the node becomes unresponsive >> and we see huge GC pauses. The only way out has been restart of the node. >> When the nodes are working fine, the average open files in the nodes stay >> around 6K during peak load and 3K at average. >> >> Configurations: >> - 5 broker cluster (Single node spec: 24 core processors, 250 GB RAM, >> 256GB >> SSD) >> - 20 topics and 1100 partitions across all topics >> - Replication factor of 3 >> - Java based KafkaProducer and high level consumers >> (ZookeeperConsumerConnector) >> - GC params { -Xmx32G -Xms4G -server -XX:MetaspaceSize=96m -XX:+UseG1GC >> -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 >> -XX:G1HeapRegionSize=16M -XX:MinMetaspaceFreeRatio=50 >> -XX:MaxMetaspaceFreeRatio=80 } >> >> Any pointers here? Appreciate your help. >> >> Thanks, >> Bharath >> >> >