Hello Kafka Experts.
We have experienced a long running GC problem for a while. GC will eventually eat up all the cpu cycles on the physical box. Have chatted with Gwen regarding this issue during recent ended Hadoop conference at NYC, per her recommendation we switched to G1, but we still see the same problem happening. We have tried on two different version of jdk (java7u51 and java7u71) and two different GC (CMS and G1). The fun part for this problem is that once we run jstack to do thread dump it will immediately terminate the long running GC and everything is back to normal. I am attaching my jvm parameters and gc logs as reference and hope some experts can shed light on this problem -Xmx4G -Xms4G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true -Xloggc:/logs/kaf/kafka-broker-solr-hbase/kafkaServer-gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps GC log 2015-10-15T14:35:29.910-0500: 167428.343: [GC pause (young), 0.0087380 secs] [Parallel Time: 5.8 ms, GC Workers: 18] [GC Worker Start (ms): Min: 167428342.9, Avg: 167428343.1, Max: 167428343.3, Diff: 0.4] [Ext Root Scanning (ms): Min: 0.9, Avg: 1.3, Max: 1.7, Diff: 0.8, Sum: 23.2] [Update RS (ms): Min: 2.3, Avg: 2.6, Max: 2.8, Diff: 0.5, Sum: 46.2] [Processed Buffers: Min: 4, Avg: 5.9, Max: 10, Diff: 6, Sum: 107] [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.1, Sum: 1.9] [Object Copy (ms): Min: 1.4, Avg: 1.4, Max: 1.6, Diff: 0.2, Sum: 25.9] [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.8] [GC Worker Total (ms): Min: 5.2, Avg: 5.5, Max: 5.7, Diff: 0.4, Sum: 98.2] [GC Worker End (ms): Min: 167428348.6, Avg: 167428348.6, Max: 167428348.7, Diff: 0.1] [Code Root Fixup: 0.0 ms] [Clear CT: 0.8 ms] [Other: 2.2 ms] [Choose CSet: 0.0 ms] [Ref Proc: 0.8 ms] [Ref Enq: 0.0 ms] [Free CSet: 1.3 ms] [Eden: 788.0M(788.0M)->0.0B(952.0M) Survivors: 14.0M->6144.0K Heap: 1199.6M(4096.0M)->403.9M(4096.0M)] [Times: user=0.11 sys=0.00, real=0.01 secs] 2015-10-15T14:35:36.522-0500: 167434.955: [GC pause (young), 1227.2679990 secs] [Parallel Time: 1227265.2 ms, GC Workers: 18] [GC Worker Start (ms): Min: 167434954.8, Avg: 167434955.0, Max: 167434955.2, Diff: 0.4] [Ext Root Scanning (ms): Min: 1.1, Avg: 68182.6, Max: 1227264.8, Diff: 1227263.6, Sum: 1227287.6] [Update RS (ms): Min: 0.0, Avg: 1.9, Max: 2.2, Diff: 2.2, Sum: 34.1] [Processed Buffers: Min: 0, Avg: 4.8, Max: 11, Diff: 11, Sum: 87] [Scan RS (ms): Min: 0.1, Avg: 0.1, Max: 0.2, Diff: 0.1, Sum: 2.4] [Object Copy (ms): Min: 0.1, Avg: 1.4, Max: 1.6, Diff: 1.5, Sum: 25.2] [Termination (ms): Min: 0.0, Avg: 1159078.8, Max: 1227259.9, Diff: 1227259.9, Sum: 20863418.3] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.3] [GC Worker Total (ms): Min: 1227264.6, Avg: 1227264.9, Max: 1227265.1, Diff: 0.4, Sum: 22090767.9] [GC Worker End (ms): Min: 168662219.8, Avg: 168662219.9, Max: 168662219.9, Diff: 0.1] [Code Root Fixup: 0.0 ms] [Clear CT: 0.6 ms] [Other: 2.2 ms] [Choose CSet: 0.0 ms] [Ref Proc: 0.6 ms] [Ref Enq: 0.0 ms] [Free CSet: 1.5 ms] [Eden: 952.0M(952.0M)->0.0B(198.0M) Survivors: 6144.0K->6144.0K Heap: 1366.0M(4096.0M)->414.0M(4096.0M)] [Times: user=20341.80 sys=0.00, real=1227.08 secs] 2015-10-15T14:56:03.852-0500: 168662.285: [GC pause (young), 0.0050930 secs] [Parallel Time: 3.3 ms, GC Workers: 18] [GC Worker Start (ms): Min: 168662284.6, Avg: 168662284.8, Max: 168662285.0, Diff: 0.4] [Ext Root Scanning (ms): Min: 1.0, Avg: 1.4, Max: 1.8, Diff: 0.8, Sum: 24.3] [Update RS (ms): Min: 0.0, Avg: 0.1, Max: 0.4, Diff: 0.4, Sum: 2.4] [Processed Buffers: Min: 0, Avg: 2.4, Max: 8, Diff: 8, Sum: 44] [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] [Object Copy (ms): Min: 1.0, Avg: 1.2, Max: 1.3, Diff: 0.3, Sum: 22.5] [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.3] [GC Worker Total (ms): Min: 2.6, Avg: 2.8, Max: 3.0, Diff: 0.4, Sum: 49.7] [GC Worker End (ms): Min: 168662287.6, Avg: 168662287.6, Max: 168662287.6, Diff: 0.0] [Code Root Fixup: 0.0 ms] [Clear CT: 0.8 ms] [Other: 1.1 ms] [Choose CSet: 0.0 ms] [Ref Proc: 0.8 ms] [Ref Enq: 0.0 ms] [Free CSet: 0.1 ms] [Eden: 30.0M(198.0M)->0.0B(198.0M) Survivors: 6144.0K->6144.0K Heap: 442.6M(4096.0M)->413.6M(4096.0M)] [Times: user=0.05 sys=0.00, real=0.01 secs] Yan ** This email and any attachments may contain information that is confidential and/or privileged for the sole use of the intended recipient. Any use, review, disclosure, copying, distribution or reliance by others, and any forwarding of this email or its contents, without the express permission of the sender is strictly prohibited by law. If you are not the intended recipient, please contact the sender immediately, delete the e-mail and destroy all copies. **