[
https://issues.apache.org/jira/browse/KAFKA-7327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589963#comment-16589963
]
ruiliang commented on KAFKA-7327:
---------------------------------
After I upgraded from kafka1.1.0 to kafka_2.12 to 2.0.0, now the memory in the
heap is recycled, but it may be that the memory outside the heap is still
soaring, not falling, but slowly rising.What's the reason?Please show me thanks
* It feels like it's about to die
This is the start log, and the parameter-related configuration
[^server.log.2018-08-23-10]
{code:java}
//代码占位符
free -m
total used free shared buff/cache available
Mem: 3790 3013 167 0 609 545
top - 17:21:15 up 21 days, 7:20, 2 users, load average: 0.02, 0.03, 0.05
Tasks: 84 total, 1 running, 83 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 3881688 total, 172536 free, 3085836 used, 623316 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 557800 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16696 liandong 20 0 7317808 2.706g 13304 S 0.0 73.1 4:06.75 java
19754 liandong 20 0 154584 2276 960 S 0.0 0.1 0:00.01 sshd
19730 liandong 20 0 154580 2268 952 S 0.0 0.1 0:00.00 sshd
20700 liandong 20 0 161812 2172 1528 R 0.0 0.1 0:00.00 top
19755 liandong 20 0 115396 2100 1644 S 0.0 0.1 0:00.01 bash
19731 liandong 20 0 115396 2044 1608 S 0.0 0.1 0:00.00 bash
[liandong@prod_data_kafka_1 ~]$ jmap -heap 16696
Attaching to process ID 16696, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.151-b12
using thread-local object allocation.
Garbage-First (G1) GC with 2 thread(s)
Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 2147483648 (2048.0MB)
NewSize = 1363144 (1.2999954223632812MB)
MaxNewSize = 1287651328 (1228.0MB)
OldSize = 5452592 (5.1999969482421875MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 1048576 (1.0MB)
Heap Usage:
G1 Heap:
regions = 2048
capacity = 2147483648 (2048.0MB)
used = 290261024 (276.8144836425781MB)
free = 1857222624 (1771.1855163574219MB)
13.51633220911026% used
G1 Young Generation:
Eden Space:
regions = 88
capacity = 105906176 (101.0MB)
used = 92274688 (88.0MB)
free = 13631488 (13.0MB)
87.12871287128714% used
Survivor Space:
regions = 7
capacity = 7340032 (7.0MB)
used = 7340032 (7.0MB)
free = 0 (0.0MB)
100.0% used
G1 Old Generation:
regions = 184
capacity = 2034237440 (1940.0MB)
used = 190646304 (181.81448364257812MB)
free = 1843591136 (1758.1855163574219MB)
9.371880600132894% used
11241 interned Strings occupying 1275336 bytes.
{code}
> The CPU memory of the kafak master node continues to soar, does not recycle,
> and finally the service fails?
> -----------------------------------------------------------------------------------------------------------
>
> Key: KAFKA-7327
> URL: https://issues.apache.org/jira/browse/KAFKA-7327
> Project: Kafka
> Issue Type: Bug
> Components: consumer
> Affects Versions: 1.1.0
> Environment: linux centos7
> Reporter: ruiliang
> Priority: Blocker
> Attachments: server.log.2018-08-23-10
>
> Original Estimate: 12h
> Remaining Estimate: 12h
>
> Xmlipcregsvc-> 172.18.58.184:60686 (CLOSE_WAIT) has many such ports to close
> waiting, which is the application connection side.Why wait?Memory nor
> recycling three services are 2 nuclear 4 gb of memory, this before is
> kafka3G, found that memory, the heap memory, and then I will limit kfaka up
> to 2 g, but the master node to run after a period of time, and submitted to
> the heap memory and heap memory leak, I free -m looked at it and really have
> 100 MB of memory, I don't know where memory use, kafka made up 80% of the
> process of memory, CPU by more than 100%, what reason is this?The
> configuration parameters have been checked with the official website. The
> default is not acceptable.
> XmlIpcRegSvc->172.18.58.184:60686 (CLOSE_WAIT)
> 有很多这个样的端口关闭等待,这是应用连接端。为什么一直等待呢?内存也没有回收 我3台服务是 2核 4G
> 内存,这之前给的是kafka3G,发现内存没了,报堆外内存溢出,然后我就限制kfaka最大为2G,但主节点跑一段时间后,又报堆内存溢出和堆外内存溢出,我free
> -m看了一下,内存确实还有100MB了,不知内存用在那里,kafka 这个进程暂用完了 80%的内存,cpu
> 100%多了,这是什么原因呢?配置参数和官网核对了一下,全用默认的也不行,
> ` 1772 liandong 20 0 6398984 2.146g 16112 S 101.3 58.0 93:59.72
> /usr/local/jdk1.8/bin/java -Xmx2G -Xms1G -server -XX:+UseG1GC
> -XX:+HeapDumpOnOutOfMemoryError -XX:MaxGCPauseMillis=20
> -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent
> -Djava.awt.headless=true -XX:MaxDirectMemorySize=512m
> -Xloggc:/data/kafka/bin/../logs/kafkaSer+...`
> kafka server.log log
> `[2018-08-23 07:56:11,788] INFO [GroupCoordinator 0]: Stabilized group
> consumer.web.log generation 268 (__consumer_offsets-24)
> (kafka.coordinator.group.GroupCoordinator)
> [2018-08-23 07:56:12,054] ERROR Processor got uncaught exception.
> (kafka.network.Processor)
> java.lang.OutOfMemoryError: Java heap space
> [2018-08-23 07:56:13,846] ERROR Processor got uncaught exception.
> (kafka.network.Processor)
> java.lang.OutOfMemoryError: Java heap space
> [2018-08-23 07:56:15,673] ERROR Processor got uncaught exception.
> (kafka.network.Processor)
> java.lang.OutOfMemoryError: Direct buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:694)
> at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
> at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:241)
> at sun.nio.ch.IOUtil.read(IOUtil.java:195)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
> at
> org.apache.kafka.common.network.PlaintextTransportLayer.read(PlaintextTransportLayer.java:104)
> at
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:145)
> at
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:93)
> at
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:235)
> at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:196)
> at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:557)
> at
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:495)
> at org.apache.kafka.common.network.Selector.poll(Selector.java:424)
> at kafka.network.Processor.poll(SocketServer.scala:628)
> at kafka.network.Processor.run(SocketServer.scala:545)
> at java.lang.Thread.run(Thread.java:748)
> [2018-08-23 07:56:16,379] ERROR Processor got uncaught exception.
> (kafka.network.Processor)
> java.lang.OutOfMemoryError: Java heap space`
> 172.18.58.184:speedtrace (CLOSE_WAIT) 172.18.58.184 是 kafka client connect
> lsof -i | grep java
> `java 1772 liandong 83u IPv4 7990697 0t0 TCP *:36145 (LISTEN)
> java 1772 liandong 84u IPv4 7990698 0t0 TCP *:9099 (LISTEN)
> java 1772 liandong 85u IPv4 7990701 0t0 TCP *:40745 (LISTEN)
> java 1772 liandong 100u IPv4 7990709 0t0 TCP
> prod_data_kafka_2:44688->prod_data_zk:eforward (ESTABLISHED)
> java 1772 liandong 193u IPv4 7989816 0t0 TCP prod_data_kafka_2:XmlIpcRegSvc
> (LISTEN)
> java 1772 liandong 224u IPv4 8019955 0t0 TCP
> prod_data_kafka_2:9099->172.18.58.184:47430 (ESTABLISHED)
> java 1772 liandong 228u IPv4 8018733 0t0 TCP
> prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:33032 (CLOSE_WAIT)
> java 1772 liandong 229u IPv4 7990859 0t0 TCP
> prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:51334 (ESTABLISHED)
> java 1772 liandong 230u IPv4 8022506 0t0 TCP
> prod_data_kafka_2:36145->172.18.58.184:46112 (ESTABLISHED)
> java 1772 liandong 235u IPv4 7989829 0t0 TCP
> prod_data_kafka_2:32976->prod_data_kafka_1:XmlIpcRegSvc (ESTABLISHED)
> java 1772 liandong 236u IPv4 8022224 0t0 TCP
> prod_data_kafka_2:36145->172.18.58.184:46024 (ESTABLISHED)
> java 1772 liandong 243u IPv4 7998548 0t0 TCP
> prod_data_kafka_2:XmlIpcRegSvc->prod_data_kafka_3:39816 (ESTABLISHED)
> java 1772 liandong 247u IPv4 7998555 0t0 TCP
> prod_data_kafka_2:33206->prod_data_kafka_3:XmlIpcRegSvc (ESTABLISHED)
> java 1772 liandong 248u IPv4 8017061 0t0 TCP
> prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:60686 (CLOSE_WAIT)
> java 1772 liandong 251u IPv4 7999481 0t0 TCP
> prod_data_kafka_2:XmlIpcRegSvc->prod_data_kafka_1:48914 (ESTABLISHED)
> java 1772 liandong 254u IPv4 8016659 0t0 TCP
> prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:60920 (CLOSE_WAIT)
> java 1772 liandong 255u IPv4 8009660 0t0 TCP
> prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:59356 (ESTABLISHED)
> java 1772 liandong 256u IPv4 8017062 0t0 TCP
> prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:60700 (ESTABLISHED)
> java 1772 liandong 257u IPv4 8022398 0t0 TCP
> prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:33626 (ESTABLISHED)
> java 1772 liandong 259u IPv4 8019887 0t0 TCP
> prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:speedtrace (CLOSE_WAIT)
> `
> gc log
> `2018-08-22T19:01:45.014+0800: 31537.291: [GC pause (G1 Evacuation Pause)
> (young) (initial-mark), 0.0147456 secs]
> [Parallel Time: 12.9 ms, GC Workers: 2]
> [GC Worker Start (ms): Min: 31537291.3, Avg: 31537291.3, Max: 31537291.3,
> Diff: 0.0]
> [Ext Root Scanning (ms): Min: 1.8, Avg: 1.9, Max: 1.9, Diff: 0.1, Sum: 3.8]
> [Update RS (ms): Min: 1.9, Avg: 1.9, Max: 1.9, Diff: 0.0, Sum: 3.9]
> [Processed Buffers: Min: 14, Avg: 15.5, Max: 17, Diff: 3, Sum: 31]
> [Scan RS (ms): Min: 4.5, Avg: 4.5, Max: 4.5, Diff: 0.0, Sum: 9.0]
> [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
> [Object Copy (ms): Min: 4.1, Avg: 4.2, Max: 4.2, Diff: 0.1, Sum: 8.3]
> [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
> [Termination Attempts: Min: 4, Avg: 4.0, Max: 4, Diff: 0, Sum: 8]
> [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
> [GC Worker Total (ms): Min: 12.5, Avg: 12.5, Max: 12.5, Diff: 0.0, Sum: 25.1]
> [GC Worker End (ms): Min: 31537303.8, Avg: 31537303.8, Max: 31537303.8,
> Diff: 0.0]
> [Code Root Fixup: 0.1 ms]
> [Code Root Purge: 0.0 ms]
> [Clear CT: 0.3 ms]
> [Other: 1.4 ms]
> [Choose CSet: 0.0 ms]
> [Ref Proc: 0.2 ms]
> [Ref Enq: 0.0 ms]
> [Redirty Cards: 0.1 ms]
> [Humongous Register: 0.0 ms]
> [Humongous Reclaim: 0.1 ms]
> [Free CSet: 0.6 ms]
> [Eden: 781.0M(781.0M)->0.0B(781.0M) Survivors: 3072.0K->3072.0K Heap:
> 2106.0M(2347.0M)->1325.2M(2347.0M)]
> [Times: user=0.03 sys=0.00, real=0.02 secs]
> 2018-08-22T19:01:45.029+0800: 31537.306: [GC
> concurrent-root-region-scan-start]
> 2018-08-22T19:01:45.039+0800: 31537.315: [GC concurrent-root-region-scan-end,
> 0.0098860 secs]
> 2018-08-22T19:01:45.039+0800: 31537.315: [GC concurrent-mark-start]
> 2018-08-22T19:01:45.111+0800: 31537.388: [GC concurrent-mark-end, 0.0721221
> secs]
> 2018-08-22T19:01:45.111+0800: 31537.388: [GC remark
> 2018-08-22T19:01:45.111+0800: 31537.388: [Finalize Marking, 0.0002506 secs]
> 2018-08-22T19:01:45.111+0800: 31537.388: [GC ref-proc, 0.0008536 secs]
> 2018-08-22T19:01:45.112+0800: 31537.389: [Unloading, 0.0159521 secs],
> 0.0264459 secs]
> [Times: user=0.05 sys=0.00, real=0.03 secs]
> 2018-08-22T19:01:45.139+0800: 31537.415: [GC cleanup 1339M->1339M(2347M),
> 0.0026152 secs]
> [Times: user=0.00 sys=0.00, real=0.00 secs]
> 2018-08-22T19:01:48.222+0800: 31540.499: [GC pause (G1 Evacuation Pause)
> (young), 0.0141944 secs]
> [Parallel Time: 12.6 ms, GC Workers: 2]
> [GC Worker Start (ms): Min: 31540499.4, Avg: 31540499.4, Max: 31540499.4,
> Diff: 0.0]
> [Ext Root Scanning (ms): Min: 1.4, Avg: 1.4, Max: 1.4, Diff: 0.1, Sum: 2.8]
> [Update RS (ms): Min: 2.3, Avg: 2.3, Max: 2.3, Diff: 0.1, Sum: 4.6]
> [Processed Buffers: Min: 11, Avg: 17.0, Max: 23, Diff: 12, Sum: 34]
> [Scan RS (ms): Min: 4.4, Avg: 4.5, Max: 4.5, Diff: 0.1, Sum: 8.9]
> [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
> [Object Copy (ms): Min: 4.2, Avg: 4.3, Max: 4.3, Diff: 0.1, Sum: 8.5]
> [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
> [Termination Attempts: Min: 1, Avg: 2.5, Max: 4, Diff: 3, Sum: 5]
> [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
> [GC Worker Total (ms): Min: 12.5, Avg: 12.5, Max: 12.5, Diff: 0.0, Sum: 24.9]
> [GC Worker End (ms): Min: 31540511.9, Avg: 31540511.9, Max: 31540511.9,
> Diff: 0.0]
> [Code Root Fixup: 0.1 ms]
> [Code Root Purge: 0.0 ms]
> [Clear CT: 0.3 ms]
> [Other: 1.3 ms]
> [Choose CSet: 0.0 ms]
> [Ref Proc: 0.2 ms]
> [Ref Enq: 0.0 ms]
> [Redirty Cards: 0.1 ms]
> [Humongous Register: 0.1 ms]
> [Humongous Reclaim: 0.1 ms]
> [Free CSet: 0.6 ms]
> [Eden: 781.0M(781.0M)->0.0B(780.0M) Survivors: 3072.0K->3072.0K Heap:
> 2106.2M(2347.0M)->1325.2M(2347.0M)]
> [Times: user=0.02 sys=0.00, real=0.01 secs]
> 2018-08-22T19:01:51.373+0800: 31543.650: [GC pause (G1 Evacuation Pause)
> (young), 0.0146431 secs]
> [Parallel Time: 13.1 ms, GC Workers: 2]
> [GC Worker Start (ms): Min: 31543649.9, Avg: 31543649.9, Max: 31543649.9,
> Diff: 0.0]
> [Ext Root Scanning (ms): Min: 1.4, Avg: 1.4, Max: 1.5, Diff: 0.1, Sum: 2.8]
> [Update RS (ms): Min: 2.4, Avg: 2.4, Max: 2.5, Diff: 0.1, Sum: 4.8]
> [Processed Buffers: Min: 8, Avg: 17.5, Max: 27, Diff: 19, Sum: 35]
> [Scan RS (ms): Min: 4.5, Avg: 4.6, Max: 4.7, Diff: 0.2, Sum: 9.2]
> [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
> [Object Copy (ms): Min: 4.4, Avg: 4.4, Max: 4.5, Diff: 0.1, Sum: 8.9]
> [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
> [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 2]
> [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
> [GC Worker Total (ms): Min: 13.0, Avg: 13.0, Max: 13.0, Diff: 0.0, Sum: 25.9]
> [GC Worker End (ms): Min: 31543662.8, Avg: 31543662.8, Max: 31543662.9,
> Diff: 0.0]
> [Code Root Fixup: 0.1 ms]
> [Code Root Purge: 0.0 ms]
> [Clear CT: 0.4 ms]
> [Other: 1.2 ms]
> [Choose CSet: 0.0 ms]
> [Ref Proc: 0.1 ms]
> [Ref Enq: 0.0 ms]
> [Redirty Cards: 0.1 ms]
> [Humongous Register: 0.1 ms]
> [Humongous Reclaim: 0.1 ms]
> [Free CSet: 0.6 ms]
> [Eden: 780.0M(780.0M)->0.0B(780.0M) Survivors: 3072.0K->3072.0K Heap:
> 2105.2M(2347.0M)->1325.3M(2347.0M)]
> [Times: user=0.02 sys=0.00, real=0.02 secs] `
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)