Sorry - some final details on this for today. I stopped the broker being added to the cluster (the one with GC taking 1min & both cores), dropped the max heap to 1GB (out of 4GB) and started it up again. It joined nicely, caught up and went ISR. Haven't yet run into the increasing GC cycles (which typically started during the first few minutes) and GC logs look like this:
2015-03-30T22:23:53.518+0300: 2494.633: [GC [1 CMS-initial-mark: 516589K(878208K)] 532706K(1031552K), 0.0043530 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 2015-03-30T22:23:53.523+0300: 2494.637: [CMS-concurrent-mark-start] 2015-03-30T22:23:53.544+0300: 2494.658: [CMS-concurrent-mark: 0.021/0.021 secs] [Times: user=0.02 sys=0.00, real=0.02 secs] 2015-03-30T22:23:53.544+0300: 2494.658: [CMS-concurrent-preclean-start] 2015-03-30T22:23:53.547+0300: 2494.661: [CMS-concurrent-preclean: 0.002/0.002 secs] [Times: user=0.00 sys=0.00, real=0.01 secs] 2015-03-30T22:23:53.547+0300: 2494.661: [CMS-concurrent-abortable-preclean-start] CMS: abort preclean due to time 2015-03-30T22:23:58.595+0300: 2499.710: [CMS-concurrent-abortable-preclean: 0.442/5.049 secs] [Times: user=0.56 sys=0.01, real=5.05 secs] 2015-03-30T22:23:58.596+0300: 2499.710: [GC[YG occupancy: 38583 K (153344 K)]2015-03-30T22:23:58.596+0300: 2499.710: [GC2015-03-30T22:23:58.596+0300: 2499.710: [ParNew: 38583K->519K(153344K), 0.0051950 secs] 555173K->517224K(1031552K), 0.0052450 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 2015-03-30T22:23:58.601+0300: 2499.715: [Rescan (parallel) , 0.0027310 secs]2015-03-30T22:23:58.604+0300: 2499.718: [weak refs processing, 0.0000320 secs]2015-03-30T22:23:58.604+0300: 2499.718: [class unloading, 0.0008720 secs]2015-03-30T22:23:58.605+0300: 2499.719: [scrub symbol table, 0.0009200 secs]2015-03-30T22:23:58.606+0300: 2499.720: [scrub string table, 0.0001670 secs] [1 CMS-remark: 516705K(878208K)] 517224K(1031552K), 0.0105120 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 2015-03-30T22:23:58.606+0300: 2499.720: [CMS-concurrent-sweep-start] 2015-03-30T22:23:58.612+0300: 2499.727: [CMS-concurrent-sweep: 0.006/0.006 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 2015-03-30T22:23:58.612+0300: 2499.727: [CMS-concurrent-reset-start] 2015-03-30T22:23:58.614+0300: 2499.728: [CMS-concurrent-reset: 0.002/0.002 secs] [Times: user=0.00 sys=0.00, real=0.01 secs] So I wonder whether decreasing the heap helped or could the GC issues on the first restart be something to do with the amount of termporary objects when replicating all topics & catching up with the other brokers after a clean start? During my second restart I didn't touch the data directories, so much of the data already existed on the broker. -Timo On 30 March 2015 at 21:28, Timo Ahokas <timo.aho...@gmail.com> wrote: > A few more details before the looming crash. The broker replicating data > and in sync (before GC cycle starts) - GC cycles taking nearly 1min real: > > 2015-03-30T21:08:39.502+0300: 3805.931: [GC2015-03-30T21:08:39.502+0300: > 3805.932: [ParNew: 142199K->6219K(153344K), 51.6581120 secs] > 678471K->543078K(1518976K), 51.6582350 secs] [Times: user=103.29 sys=0.00, > real=51.66 secs] > > 2015-03-30T21:10:31.844+0300: 3918.273: [GC2015-03-30T21:10:31.844+0300: > 3918.273: [ParNew: 142539K->4773K(153344K), 53.1631370 secs] > 679398K->542216K(1518976K), 53.1632500 secs] [Times: user=106.30 sys=0.00, > real=53.17 secs] > > 2015-03-30T21:12:20.659+0300: 4027.088: [GC2015-03-30T21:12:20.659+0300: > 4027.088: [ParNew: 141093K->2869K(153344K), 54.4376270 secs] > 678536K->540931K(1518976K), 54.4377530 secs] [Times: user=108.85 sys=0.00, > real=54.44 secs] > > 2015-03-30T21:14:12.462+0300: 4138.891: [GC2015-03-30T21:14:12.462+0300: > 4138.891: [ParNew: 139189K->3812K(153344K), 56.1337310 secs] > 677251K->542473K(1518976K), 56.1338620 secs] [Times: user=112.22 sys=0.00, > real=56.14 secs] > > 2015-03-30T21:16:02.699+0300: 4249.128: [GC2015-03-30T21:16:02.699+0300: > 4249.128: [ParNew: 140132K->2016K(153344K), 57.0575330 secs] > 678793K->541266K(1518976K), 57.0576620 secs] [Times: user=114.07 sys=0.00, > real=57.06 secs] > > Top shows Kafka/Java taking up both cores and all memory is used up (1.5GB > allocated to Kafka out of 4GB on the instance): > > top - 21:18:35 up 1:20, 3 users, load average: 1.28, 1.04, 0.94 > > Tasks: 130 total, 2 running, 128 sleeping, 0 stopped, 0 zombie > > %Cpu(s): 99.7 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, > 0.3 st > > KiB Mem: 4046804 total, 3898580 used, 148224 free, 18748 buffers > > KiB Swap: 0 total, 0 used, 0 free. 2868556 cached Mem > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > > 13943 kafka 20 0 4915068 813388 12648 S 199.7 20.1 66:37.21 java > > > > Kafka startup params as follows: > > java -Xmx1500M -Xms1500M -server -XX:+UseCompressedOops -XX:+UseParNewGC > -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled > -XX:+CMSScavengeBeforeRemark -XX:+DisableExplicitGC > -Djava.awt.headless=true -Xloggc:/opt/kafka/bin/../logs/kafkaServer-gc.log > -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps > -XX:+PrintGCTimeStamps -Dcom.sun.management.jmxremote > -Dcom.sun.management.jmxremote.authenticate=false > -Dcom.sun.management.jmxremote.ssl=false > -Dcom.sun.management.jmxremote.port=9095 > -Dkafka.logs.dir=/opt/kafka/bin/../logs > -Dlog4j.configuration=file:/opt/kafka/bin/../config/log4j.properties -cp > :/opt/kafka/bin/../core/build/dependant-libs-2.8.0/*.jar:/opt/kafka/bin/../perf/build/libs//kafka-perf_2.8.0*.jar:/opt/kafka/bin/../clients/build/libs//kafka-clients*.jar:/opt/kafka/bin/../examples/build/libs//kafka-examples*.jar:/opt/kafka/bin/../contrib/hadoop-consumer/build/libs//kafka-hadoop-consumer*.jar:/opt/kafka/bin/../contrib/hadoop-producer/build/libs//kafka-hadoop-producer*.jar:/opt/kafka/bin/../libs/jopt-simple-3.2.jar:/opt/kafka/bin/../libs/kafka_2.9.2-0.8.1.1-javadoc.jar:/opt/kafka/bin/../libs/kafka_2.9.2-0.8.1.1-scaladoc.jar:/opt/kafka/bin/../libs/kafka_2.9.2-0.8.1.1-sources.jar:/opt/kafka/bin/../libs/kafka_2.9.2-0.8.1.1.jar:/opt/kafka/bin/../libs/log4j-1.2.15.jar:/opt/kafka/bin/../libs/metrics-core-2.2.0.jar:/opt/kafka/bin/../libs/scala-library-2.9.2.jar:/opt/kafka/bin/../libs/slf4j-api-1.7.2.jar:/opt/kafka/bin/../libs/snappy-java-1.0.5.jar:/opt/kafka/bin/../libs/zkclient-0.3.jar:/opt/kafka/bin/../libs/zookeeper-3.3.4.jar:/opt/kafka/bin/../core/build/libs/kafka_2.8.0*.jar > kafka.Kafka /opt/kafka/config/server.properties > > > Any suggestions more than welcome... > > > Best regards, > > Timo > > On 30 March 2015 at 20:34, Timo Ahokas <timo.aho...@gmail.com> wrote: > >> Hi, >> >> A related issue which results in a similar increasing GC cycles (but >> doesn't recover)... We're trying to bootstrap a new broker into the cluster >> (replacing one of the 3 main ones described previously). It connects fine, >> starts replicating data and becomes ISR, but starts a similar GC cycle on >> increasing duration, e.g: >> >> 2015-03-30T20:05:26.927+0300: 13.356: [CMS-concurrent-reset-start] >> >> 2015-03-30T20:05:26.929+0300: 13.359: [CMS-concurrent-reset: 0.002/0.002 >> secs] [Times: user=0.00 sys=0.00, real=0.00 secs] >> >> 2015-03-30T20:05:27.768+0300: 14.198: [GC2015-03-30T20:05:27.769+0300: >> 14.198: [ParNew: 141729K->4172K(153344K), 3.1090220 secs] >> 657022K->519464K(1518976K), 3.1091410 secs] [Times: user=6.22 sys=0.00, >> real=3.11 secs] >> >> 2015-03-30T20:05:31.749+0300: 18.178: [GC2015-03-30T20:05:31.749+0300: >> 18.178: [ParNew: 139718K->2145K(153344K), 3.1022510 secs] >> 655011K->517437K(1518976K), 3.1023670 secs] [Times: user=6.20 sys=0.00, >> real=3.10 secs] >> >> 2015-03-30T20:05:36.411+0300: 22.841: [GC2015-03-30T20:05:36.411+0300: >> 22.841: [ParNew: 138465K->3230K(153344K), 3.0921260 secs] >> 653757K->518522K(1518976K), 3.0922300 secs] [Times: user=6.18 sys=0.01, >> real=3.09 secs] >> >> 2015-03-30T20:05:40.392+0300: 26.821: [GC2015-03-30T20:05:40.392+0300: >> 26.821: [ParNew: 138209K->1170K(153344K), 3.3535970 secs] >> 653501K->516470K(1518976K), 3.9147070 secs] [Times: user=6.39 sys=0.00, >> real=3.92 secs] >> >> 2015-03-30T20:05:45.251+0300: 31.680: [GC2015-03-30T20:05:45.251+0300: >> 31.680: [ParNew: 136773K->2394K(153344K), 3.1427200 secs] >> 652073K->517706K(1518976K), 3.1428120 secs] [Times: user=6.27 sys=0.00, >> real=3.15 secs] >> >> 2015-03-30T20:05:50.294+0300: 36.723: [GC2015-03-30T20:05:51.469+0300: >> 37.899: [ParNew: 138208K->390K(153344K), 3.1697040 secs] >> 653519K->515701K(1518976K), 4.3452330 secs] [Times: user=6.29 sys=0.01, >> real=4.34 secs] >> >> 2015-03-30T20:05:55.565+0300: 41.995: [GC2015-03-30T20:05:55.566+0300: >> 41.995: [ParNew: 136710K->3439K(153344K), 3.1681960 secs] >> 652021K->518769K(1518976K), 3.1682890 secs] [Times: user=6.32 sys=0.00, >> real=3.17 secs] >> >> 2015-03-30T20:05:59.717+0300: 46.146: [GC2015-03-30T20:05:59.717+0300: >> 46.146: [ParNew: 138893K->2417K(153344K), 3.1984450 secs] >> 654224K->517770K(1518976K), 3.1985550 secs] [Times: user=6.40 sys=0.00, >> real=3.20 secs] >> >> 2015-03-30T20:06:03.741+0300: 50.170: [GC2015-03-30T20:06:03.741+0300: >> 50.170: [ParNew: 138737K->1389K(153344K), 3.2550790 secs] >> 654090K->516778K(1518976K), 3.2551820 secs] [Times: user=6.51 sys=0.00, >> real=3.26 secs] >> >> 2015-03-30T20:06:08.158+0300: 54.588: [GC2015-03-30T20:06:08.158+0300: >> 54.588: [ParNew: 137224K->2403K(153344K), 3.3705920 secs] >> 652613K->517813K(1518976K), 3.3707200 secs] [Times: user=6.59 sys=0.01, >> real=3.37 secs] >> >> .... >> >> >> This goes up to when the realtime of the cycle is 1 min or so and then >> the server becomes unresponsive. >> >> Any suggestions on what cloud be causing this? >> >> >> Best regards, >> >> Timo >> >> On 26 March 2015 at 19:27, Timo Ahokas <timo.aho...@gmail.com> wrote: >> >>> Hi, >>> >>> We are seeing really weird GC behaviour on our Kafka cluster. After a >>> clean stop/start of the whole cluster, the servers go into a loop of >>> increasingly slow GC cycles for an hour and then seem to recover to a >>> normal state without any external changes to the servers or traffic. The >>> load to the cluster is minimal - 10->100 req/s. >>> >>> Our setup: >>> >>> - Kafka 0.8.1.1 >>> - 3 brokers running in AWS (t2.medium instances with EBS storage) >>> - 4GB memory with 1.5GB allocated to the Java heap >>> - Under 10 topics, with replication factor of 3 & 1-10 partitions per >>> topic >>> - Ubuntu 14.04, Oracle JDK 1.7 >>> >>> As mentioned, we start up the Kafka cluster and start running very low >>> load to it (less than 100 messages/second produced, initially in the low >>> tens). After a couple of minutes the brokers start running GC with each >>> cycle taking longer (1s -> 60s), which results in the cluster moving >>> partition leadership around (presumably as the leader becomes unresponsive >>> during GC). Them nearly an hour later the GC times drop down and end up in >>> the 10-20 ms range after a few cycles. And the load to the cluster was >>> actually increasing at this time (but still probably less than 100 req/s). >>> It doesn't seem to be directly related to an external resource (e.g EBS or >>> instance problems) as we've seen this multiple times and with newly >>> provisioned servers also. >>> >>> Here's an example of a GC log from one broker: >>> >>> Java HotSpot(TM) 64-Bit Server VM (24.76-b04) for linux-amd64 JRE >>> (1.7.0_76-b13), built on Dec 18 2014 16:58:39 by "java_re" with gcc 4.3.0 >>> 20080428 (Red Hat 4.3.0-8) >>> >>> Memory: 4k page, physical 4046804k(2532400k free), swap 0k(0k free) >>> >>> CommandLine flags: -XX:+CMSClassUnloadingEnabled >>> -XX:+CMSScavengeBeforeRemark -XX:+DisableExplicitGC >>> -XX:InitialHeapSize=1572864000 -XX:+ManagementServer >>> -XX:MaxHeapSize=1572864000 -XX:MaxNewSize=174485504 >>> -XX:MaxTenuringThreshold=6 -XX:NewSize=174485504 -XX:OldPLABSize=16 >>> -XX:OldSize=348971008 -XX:+PrintGC -XX:+PrintGCDateStamps >>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseCompressedOops >>> -XX:+UseConcMarkSweepGC -XX:+UseParNewGC >>> >>> 2015-03-26T07:16:50.119+0200: 16.396: [GC [1 CMS-initial-mark: >>> 512000K(1365632K)] 593874K(1518976K), 0.0188480 secs] [Times: user=0.02 >>> sys=0.00, real=0.02 secs] >>> >>> 2015-03-26T07:16:50.138+0200: 16.415: [CMS-concurrent-mark-start] >>> >>> 2015-03-26T07:16:50.182+0200: 16.459: [CMS-concurrent-mark: 0.042/0.044 >>> secs] [Times: user=0.06 sys=0.01, real=0.04 secs] >>> >>> 2015-03-26T07:16:50.182+0200: 16.459: [CMS-concurrent-preclean-start] >>> >>> 2015-03-26T07:16:50.202+0200: 16.479: [CMS-concurrent-preclean: >>> 0.019/0.020 secs] [Times: user=0.03 sys=0.01, real=0.02 secs] >>> >>> 2015-03-26T07:16:50.202+0200: 16.479: >>> [CMS-concurrent-abortable-preclean-start] >>> >>> CMS: abort preclean due to time 2015-03-26T07:16:55.286+0200: 21.563: >>> [CMS-concurrent-abortable-preclean: 0.178/5.085 secs] [Times: user=0.75 >>> sys=0.01, real=5.09 secs] >>> >>> 2015-03-26T07:16:55.287+0200: 21.564: [GC[YG occupancy: 106584 K (153344 >>> K)]2015-03-26T07:16:55.287+0200: 21.564: [GC2015-03-26T07:16:55.287+0200: >>> 21.564: [ParNew: 106584K->10552K(153344K), 0.0189980 secs] >>> 618584K->522552K(1518976K), 0.0190710 secs] [Times: user=0.04 sys=0.00, >>> real=0.02 secs] >>> >>> 2015-03-26T07:16:55.306+0200: 21.583: [Rescan (parallel) , 0.0085400 >>> secs]2015-03-26T07:16:55.314+0200: 21.591: [weak refs processing, 0.0000230 >>> secs]2015-03-26T07:16:55.314+0200: 21.591: [class unloading, 0.0006920 >>> secs]2015-03-26T07:16:55.315+0200: 21.592: [scrub symbol table, 0.0011410 >>> secs]2015-03-26T07:16:55.316+0200: 21.593: [scrub string table, 0.0001530 >>> secs] [1 CMS-remark: 512000K(1365632K)] 522552K(1518976K), 0.0298860 secs] >>> [Times: user=0.06 sys=0.00, real=0.03 secs] >>> >>> 2015-03-26T07:16:55.318+0200: 21.595: [CMS-concurrent-sweep-start] >>> >>> 2015-03-26T07:16:55.322+0200: 21.599: [CMS-concurrent-sweep: 0.004/0.004 >>> secs] [Times: user=0.01 sys=0.00, real=0.00 secs] >>> >>> 2015-03-26T07:16:55.322+0200: 21.600: [CMS-concurrent-reset-start] >>> >>> 2015-03-26T07:16:55.329+0200: 21.606: [CMS-concurrent-reset: 0.006/0.007 >>> secs] [Times: user=0.00 sys=0.01, real=0.01 secs] >>> >>> 2015-03-26T07:20:08.027+0200: 214.304: [GC2015-03-26T07:20:08.027+0200: >>> 214.304: [ParNew: 146872K->2825K(153344K), 2.7813230 secs] >>> 658872K->518034K(1518976K), 2.7814290 secs] [Times: user=5.58 sys=0.00, >>> real=2.78 secs] >>> >>> 2015-03-26T07:20:56.462+0200: 262.739: [GC2015-03-26T07:20:56.462+0200: >>> 262.739: [ParNew: 139145K->6899K(153344K), 2.9533990 secs] >>> 654354K->522108K(1518976K), 2.9534960 secs] [Times: user=5.93 sys=0.00, >>> real=2.95 secs] >>> >>> 2015-03-26T07:21:51.915+0200: 318.192: [GC2015-03-26T07:21:51.915+0200: >>> 318.192: [ParNew: 143219K->14567K(153344K), 2.9688390 secs] >>> 658428K->529776K(1518976K), 2.9689670 secs] [Times: user=5.97 sys=0.00, >>> real=2.97 secs] >>> >>> 2015-03-26T07:22:55.621+0200: 381.898: [GC2015-03-26T07:22:55.621+0200: >>> 381.898: [ParNew: 150887K->14000K(153344K), 4.3638880 secs] >>> 666096K->530231K(1518976K), 4.3639820 secs] [Times: user=8.76 sys=0.00, >>> real=4.37 secs] >>> >>> 2015-03-26T07:24:02.103+0200: 448.380: [GC2015-03-26T07:24:02.103+0200: >>> 448.380: [ParNew: 150320K->7374K(153344K), 4.4849670 secs] >>> 666551K->523605K(1518976K), 4.4850660 secs] [Times: user=9.00 sys=0.00, >>> real=4.49 secs] >>> >>> 2015-03-26T07:25:06.110+0200: 512.388: [GC2015-03-26T07:25:06.111+0200: >>> 512.388: [ParNew: 143694K->14118K(153344K), 4.4852630 secs] >>> 659925K->530349K(1518976K), 4.4853500 secs] [Times: user=9.00 sys=0.00, >>> real=4.48 secs] >>> >>> 2015-03-26T07:26:14.406+0200: 580.684: [GC2015-03-26T07:26:14.407+0200: >>> 580.684: [ParNew: 150438K->16563K(153344K), 11.0235000 secs] >>> 666669K->537384K(1518976K), 11.0236080 secs] [Times: user=22.12 sys=0.00, >>> real=11.02 secs] >>> >>> 2015-03-26T07:27:59.141+0200: 685.418: [GC2015-03-26T07:27:59.141+0200: >>> 685.418: [ParNew: 152883K->13870K(153344K), 13.1984820 secs] >>> 673704K->534691K(1518976K), 13.1985800 secs] [Times: user=26.44 sys=0.00, >>> real=13.20 secs] >>> >>> 2015-03-26T07:29:45.825+0200: 792.102: [GC2015-03-26T07:29:45.825+0200: >>> 792.102: [ParNew: 150190K->8749K(153344K), 16.4442890 secs] >>> 671011K->531093K(1518976K), 16.4443990 secs] [Times: user=32.92 sys=0.00, >>> real=16.44 secs] >>> >>> 2015-03-26T07:31:43.393+0200: 909.670: [GC2015-03-26T07:31:43.393+0200: >>> 909.670: [ParNew: 145069K->16666K(153344K), 17.1249960 secs] >>> 667413K->539010K(1518976K), 17.1250910 secs] [Times: user=34.28 sys=0.00, >>> real=17.13 secs] >>> >>> 2015-03-26T07:33:48.146+0200: 1034.423: [GC2015-03-26T07:33:48.146+0200: >>> 1034.423: [ParNew: 152986K->10212K(153344K), 23.7178960 secs] >>> 675330K->536361K(1518976K), 23.7180010 secs] [Times: user=47.45 sys=0.00, >>> real=23.71 secs] >>> >>> 2015-03-26T07:35:41.187+0200: 1147.464: [GC2015-03-26T07:35:41.187+0200: >>> 1147.464: [ParNew: 146532K->15576K(153344K), 25.4787270 secs] >>> 672681K->541725K(1518976K), 25.4788360 secs] [Times: user=50.97 sys=0.00, >>> real=25.48 secs] >>> >>> 2015-03-26T07:37:26.072+0200: 1252.349: [GC2015-03-26T07:37:26.072+0200: >>> 1252.349: [ParNew: 151896K->9807K(153344K), 30.2121220 secs] >>> 678045K->539111K(1518976K), 30.2122180 secs] [Times: user=60.42 sys=0.00, >>> real=30.22 secs] >>> >>> 2015-03-26T07:40:14.518+0200: 1420.795: [GC2015-03-26T07:40:14.518+0200: >>> 1420.795: [ParNew: 146127K->17024K(153344K), 32.0691660 secs] >>> 675431K->546342K(1518976K), 32.0692640 secs] [Times: user=64.11 sys=0.00, >>> real=32.07 secs] >>> >>> 2015-03-26T07:41:47.096+0200: 1513.373: [GC2015-03-26T07:41:47.096+0200: >>> 1513.373: [ParNew: 153344K->10479K(153344K), 40.0872490 secs] >>> 682662K->543861K(1518976K), 40.0873620 secs] [Times: user=80.16 sys=0.00, >>> real=40.09 secs] >>> >>> 2015-03-26T07:43:20.840+0200: 1607.117: [GC2015-03-26T07:43:20.840+0200: >>> 1607.118: [ParNew: 146799K->17024K(153344K), 41.4909450 secs] >>> 680181K->550413K(1518976K), 41.4910490 secs] [Times: user=82.97 sys=0.00, >>> real=41.49 secs] >>> >>> 2015-03-26T07:45:38.200+0200: 1744.477: [GC2015-03-26T07:45:38.200+0200: >>> 1744.477: [ParNew: 153344K->11552K(153344K), 44.5490790 secs] >>> 686733K->546792K(1518976K), 44.5491900 secs] [Times: user=89.08 sys=0.00, >>> real=44.55 secs] >>> >>> 2015-03-26T07:49:56.546+0200: 2002.823: [GC2015-03-26T07:49:56.546+0200: >>> 2002.823: [ParNew: 147872K->10963K(153344K), 45.4665010 secs] >>> 683112K->546204K(1518976K), 45.4666110 secs] [Times: user=90.91 sys=0.00, >>> real=45.46 secs] >>> >>> 2015-03-26T07:54:35.158+0200: 2281.435: [GC2015-03-26T07:54:35.158+0200: >>> 2281.435: [ParNew: 147283K->12000K(153344K), 45.3558830 secs] >>> 682524K->547241K(1518976K), 45.3559960 secs] [Times: user=90.68 sys=0.00, >>> real=45.35 secs] >>> >>> 2015-03-26T07:58:58.888+0200: 2545.165: [GC2015-03-26T07:58:58.888+0200: >>> 2545.165: [ParNew: 148320K->8521K(153344K), 50.8855520 secs] >>> 683561K->546682K(1518976K), 50.8856520 secs] [Times: user=101.73 sys=0.00, >>> real=50.88 secs] >>> >>> 2015-03-26T08:03:25.615+0200: 2811.892: [GC2015-03-26T08:03:25.615+0200: >>> 2811.892: [ParNew: 144841K->6887K(153344K), 56.1553500 secs] >>> 683002K->547305K(1518976K), 56.1554700 secs] [Times: user=112.26 sys=0.00, >>> real=56.16 secs] >>> >>> 2015-03-26T08:07:59.738+0200: 3086.015: [GC2015-03-26T08:07:59.738+0200: >>> 3086.015: [ParNew: 143207K->5223K(153344K), 11.1719880 secs] >>> 683625K->547498K(1518976K), 11.1720970 secs] [Times: user=22.33 sys=0.00, >>> real=11.17 secs] >>> >>> 2015-03-26T08:11:41.909+0200: 3308.186: [GC2015-03-26T08:11:41.909+0200: >>> 3308.186: [ParNew: 141543K->5778K(153344K), 0.0121180 secs] >>> 683818K->548443K(1518976K), 0.0121990 secs] [Times: user=0.02 sys=0.00, >>> real=0.01 secs] >>> >>> 2015-03-26T08:15:05.817+0200: 3512.094: [GC2015-03-26T08:15:05.817+0200: >>> 3512.094: [ParNew: 142098K->10943K(153344K), 0.0228440 secs] >>> 684763K->554166K(1518976K), 0.0229390 secs] [Times: user=0.04 sys=0.00, >>> real=0.02 secs] >>> >>> 2015-03-26T08:16:50.297+0200: 3616.574: [GC2015-03-26T08:16:50.297+0200: >>> 3616.574: [ParNew: 147263K->16262K(153344K), 0.0269190 secs] >>> 690486K->560618K(1518976K), 0.0270040 secs] [Times: user=0.05 sys=0.01, >>> real=0.02 secs] >>> >>> We know the EBS storage is less than ideal, but we've tested a similar >>> setup with significantly higher simulated load without any issues. >>> >>> Any suggestions on what to check, test etc? >>> >>> >>> Best regards, >>> >>> Timo >>> >> >> >