We have fixed similar issues in 0.8.2.0 release. you should consider moving to latest releases.
On Thu, Apr 21, 2016 at 1:11 PM, wanghai <whp...@outlook.com> wrote: > > > > Hello > > When > kafka cluster runs a period of time, I find the cluster stunk. Consumers > can’t > read message from cluster. > > The > kafka cluster has 5 brokers, they are 0,131,132,133,134. the kafka version > is 2.8.0-0.8.1.1 > > > > I find a broker server 132 has too many > close_wait tcp, but other brokers haven’t close_wait. It still increments > until > reaching “unix max open files”, and are killed as open too many files. > > My “unix max open files” is 60000, I > think it is enough > > > > tcp > 70 0 192.168.10.132:9092 192.168.10.131:34266 > CLOSE_WAIT 17193/java > > tcp > 70 0 192.168.10.132:9092 192.168.10.134:58585 > CLOSE_WAIT 17193/java > > tcp > 70 0 192.168.10.132:9092 192.168.10.134:56025 > CLOSE_WAIT 17193/java > > tcp > 70 0 192.168.10.132:9092 192.168.10.131:50139 > CLOSE_WAIT 17193/java > > tcp > 62 0 192.168.10.132:9092 192.168.10.131:49371 > CLOSE_WAIT 17193/java > > tcp > 253 0 > 192.168.10.132:9092 > 192.168.10.130:50909 > CLOSE_WAIT 17193/java > > tcp > 62 0 192.168.10.132:9092 192.168.10.134:50905 > CLOSE_WAIT 17193/java > > tcp > 70 0 192.168.10.132:9092 192.168.10.134:50393 > CLOSE_WAIT 17193/java > > tcp > 72 0 192.168.10.132:9092 192.168.10.130:47837 > CLOSE_WAIT 17193/java > > tcp 70 > 0 192.168.10.132:9092 > 192.168.10.134:47321 > CLOSE_WAIT 17193/java > > tcp > 1 0 192.168.10.132:9092 192.168.10.134:46809 > CLOSE_WAIT 17193/java > > > > > > > > > > > The > broker server 132 logs > > > > [2016-04-20 01:09:48,736] INFO Closing socket connection to > /192.168.10.130. (kafka.network.Processor) > > [2016-04-20 01:09:49,332] INFO Closing socket connection to > /192.168.10.130. (kafka.network.Processor) > > [2016-04-20 01:09:51,523] ERROR Closing socket for /192.168.10.133 because > of error (kafka.network.Processor) > > java.io.IOException: Connection reset by peer > > at > sun.nio.ch.FileDispatcher.read0(Native Method) > > at > sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) > > at > sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233) > > at > sun.nio.ch.IOUtil.read(IOUtil.java:206) > > at > sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236) > > at > kafka.utils.Utils$.read(Utils.scala:375) > > at > > kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) > > at > kafka.network.Processor.read(SocketServer.scala:347) > > at > kafka.network.Processor.run(SocketServer.scala:245) > > at > java.lang.Thread.run(Thread.java:619) > > [2016-04-20 01:09:54,023] INFO Closing socket connection to > /192.168.10.134. (kafka.network.Processor) > > [2016-04-20 01:09:56,285] INFO Closing socket connection to > /192.168.10.134. (kafka.network.Processor) > > [2016-04-20 01:09:56,968] ERROR Closing socket for /192.168.10.133 > because of error (kafka.network.Processor) > > java.io.IOException: Broken pipe > > at > sun.nio.ch.FileDispatcher.write0(Native Method) > > at > sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29) > > at > sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104) > > at > sun.nio.ch.IOUtil.write(IOUtil.java:75) > > at > sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334) > > at > kafka.api.PartitionDataSend.writeTo(FetchResponse.scala:67) > > at > kafka.network.MultiSend.writeTo(Transmission.scala:102) > > at > kafka.api.TopicDataSend.writeTo(FetchResponse.scala:124) > > at > kafka.network.MultiSend.writeTo(Transmission.scala:102) > > at > kafka.api.FetchResponseSend.writeTo(FetchResponse.scala:219) > > at > kafka.network.Processor.write(SocketServer.scala:375) > > at > kafka.network.Processor.run(SocketServer.scala:247) > > at java.lang.Thread.run(Thread.java:619) > > [2016-04-20 01:09:56,971] INFO Closing socket connection to > /192.168.10.130. (kafka.network.Processor) > > [2016-04-20 01:09:57,328] INFO Closing socket connection to > /192.168.10.131. (kafka.network.Processor) > > [2016-04-20 01:09:57,682] INFO Closing socket connection to > /192.168.10.133. (kafka.network.Processor) > > [2016-04-20 01:09:57,683] ERROR Closing socket for /192.168.10.131 > because of error (kafka.network.Processor) > > java.io.IOException: Connection reset by peer > > at > sun.nio.ch.FileDispatcher.read0(Native Method) > > at > sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) > > at > sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233) > > at > sun.nio.ch.IOUtil.read(IOUtil.java:206) > > at > sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236) > > at > kafka.utils.Utils$.read(Utils.scala:375) > > at > > kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) > > at > kafka.network.Processor.read(SocketServer.scala:347) > > at > kafka.network.Processor.run(SocketServer.scala:245) > > at > java.lang.Thread.run(Thread.java:619) > > [2016-04-20 01:09:57,748] INFO Closing socket connection to > /192.168.10.134. (kafka.network.Processor) > > [2016-04-20 01:09:57,921] INFO Closing socket connection to > /192.168.10.133. (kafka.network.Processor) > > [2016-04-20 01:09:58,099] INFO Closing socket connection to > /192.168.10.134. (kafka.network.Processor) > > [2016-04-20 01:09:58,116] INFO Closing socket connection to > /192.168.10.131. (kafka.network.Processor) > > [2016-04-20 01:09:58,163] INFO Closing socket connection to > /192.168.10.131. (kafka.network.Processor) > > [2016-04-20 01:09:58,442] INFO Closing socket connection to > /192.168.10.134. (kafka.network.Processor) > > [2016-04-20 01:09:58,541] INFO Closing socket connection to > /192.168.10.131. (kafka.network.Processor) > > [2016-04-20 01:09:58,542] INFO Closing socket connection to > /192.168.10.130. (kafka.network.Processor) > > [2016-04-20 01:09:58,740] INFO Closing socket connection to > /192.168.10.134. (kafka.network.Processor) > > [2016-04-20 01:09:58,740] INFO Closing socket connection to > /192.168.10.131. (kafka.network.Processor) > > [2016-04-20 01:09:58,915] INFO Closing socket connection to > /192.168.10.133. (kafka.network.Processor) > > [2016-04-20 01:09:58,915] INFO Closing socket connection to > /192.168.10.134. (kafka.network.Processor) > > [2016-04-20 01:09:58,916] INFO Closing socket connection to > /192.168.10.131. (kafka.network.Processor) > > [2016-04-20 01:09:58,980] INFO Closing socket connection to > /192.168.10.133. (kafka.network.Processor) > > [2016-04-20 01:09:58,980] INFO Closing socket connection to > /192.168.10.134. (kafka.network.Processor) > > [2016-04-20 01:09:58,980] INFO Closing socket connection to > /192.168.10.133. (kafka.network.Processor) > > [2016-04-20 01:09:59,115] ERROR Closing socket for /192.168.10.133 > because of error (kafka.network.Processor) > > java.io.IOException: Broken pipe > > at > sun.nio.ch.FileDispatcher.write0(Native Method) > > at > sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29) > > at > sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104) > > at > sun.nio.ch.IOUtil.write(IOUtil.java:75) > > at > sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334) > > at > kafka.api.PartitionDataSend.writeTo(FetchResponse.scala:67) > > at kafka.network.MultiSend.writeTo(Transmission.scala:102) > > at > kafka.api.TopicDataSend.writeTo(FetchResponse.scala:124) > > at > kafka.network.MultiSend.writeTo(Transmission.scala:102) > > at > kafka.api.FetchResponseSend.writeTo(FetchResponse.scala:219) > > at > kafka.network.Processor.write(SocketServer.scala:375) > > at > kafka.network.Processor.run(SocketServer.scala:247) > > at > java.lang.Thread.run(Thread.java:619) > > [2016-04-20 01:09:59,115] INFO Closing socket connection to > /192.168.10.134. (kafka.network.Processor) > > [2016-04-20 01:09:59,115] INFO Closing socket connection to > /192.168.10.131. (kafka.network.Processor) > > [2016-04-20 01:09:59,329] INFO Closing socket connection to > /192.168.10.133. (kafka.network.Processor) > > [2016-04-20 01:09:59,329] INFO Closing socket connection to > /192.168.10.134. (kafka.network.Processor) > > [2016-04-20 01:09:59,329] INFO Closing socket connection to > /192.168.10.133. (kafka.network.Processor) > > [2016-04-20 01:09:59,332] INFO Closing socket connection to > /192.168.10.131. (kafka.network.Processor) > > [2016-04-20 01:13:43,821] INFO Partition [realtime_hardware,6] on > broker 132: Shrinking ISR for partition [realtime_hardware,6] from > 132,134,131 > to 132 (kafka.cluster.Partition) > > [2016-04-20 01:13:43,822] INFO Partition [realtime_hardware_meta,9] > on broker 132: Shrinking ISR for partition [realtime_hardware_meta,9] from > 132,133,131 to 132 (kafka.cluster.Partition) > > [2016-04-20 01:13:43,823] INFO Partition [realtime_expansion,5] on > broker 132: Shrinking ISR for partition [realtime_expansion,5] from > 132,133 to > 132 (kafka.cluster.Partition) > > [2016-04-20 01:13:43,824] INFO Partition [realtime_capacity,11] on > broker 132: Shrinking ISR for partition [realtime_capacity,11] from > 132,134,131 > to 132 (kafka.cluster.Partition) > > [2016-04-20 01:13:43,825] INFO Partition [nginx_log,14] on broker 132: > Shrinking ISR for partition [nginx_log,14] from 132,133,131 to 132 > (kafka.cluster.Partition) > > [2016-04-20 01:13:43,825] INFO Partition [nginx_log,8] on broker > 132: Shrinking ISR for partition [nginx_log,8] from 132,133,131 to 132 > (kafka.cluster.Partition) > > [2016-04-20 01:13:43,826] INFO Partition [realtime_heartbeat,12] on > broker 132: Shrinking ISR for partition [realtime_heartbeat,12] from > 132,134,131 to 132 (kafka.cluster.Partition) > > > > > > So > I discard the borker 132,and restart kafka cluster. After 24 hours, the > problem > appears again. It happens to 131. > > I don’t know how > to do. Please help me. > > > > Best wishes! > > > > >