I'm looking into performance issues on a 0.6.1 cluster. I see two symptoms: 1. Reads and writes are slow 2. One of the hosts is doing a lot of GC.
1 is slow in the sense that in normal state the cluster used to make around 3-5k read and writes per second (6-10k operations per second), but how it's in the order of 200-400 ops per second, sometimes even less. 2 looks like this: $ tail -f /outbrain/cassandra/log/system.log INFO [GC inspection] 2010-05-04 00:42:18,636 GCInspector.java (line 110) GC for ParNew: 672 ms, 166482384 reclaimed leaving 2872087208 used; max is 4432068608 INFO [GC inspection] 2010-05-04 00:42:28,638 GCInspector.java (line 110) GC for ParNew: 498 ms, 166493352 reclaimed leaving 2836049448 used; max is 4432068608 INFO [GC inspection] 2010-05-04 00:42:38,640 GCInspector.java (line 110) GC for ParNew: 327 ms, 166091528 reclaimed leaving 2796888424 used; max is 4432068608 ... and it goes on and on for hours, no stopping... The cluster is made of 6 hosts, 3 in one DC and 3 in another. Each host has 8G RAM. -Xmx=4G For some reason, the load isn't distributed evenly b/w the hosts, although I'm not sure this is the cause for slowness $ nodetool -h localhost -p 9004 ring Address Status Load Range Ring 144413773383729447702215082383444206680 192.168.252.99Up 15.94 GB 66002764663998929243644931915471302076 |<--| 192.168.254.57Up 19.84 GB 81288739225600737067856268063987022738 | ^ 192.168.254.58Up 973.78 MB 86999744104066390588161689990810839743 v | 192.168.252.62Up 5.18 GB 88308919879653155454332084719458267849 | ^ 192.168.254.59Up 10.57 GB 142482163220375328195837946953175033937 v | 192.168.252.61Up 11.36 GB 144413773383729447702215082383444206680 |-->| The slow host is 192.168.252.61 and it isn't the most loaded one. The host is waiting a lot on IO and the load average is usually 6-7 $ w 00:42:56 up 11 days, 13:22, 1 user, load average: 6.21, 5.52, 3.93 $ vmstat 5 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 0 8 2147844 45744 1816 4457384 6 5 66 32 5 2 1 1 96 2 0 0 8 2147164 49020 1808 4451596 385 0 2345 58 3372 9957 2 2 78 18 0 0 3 2146432 45704 1812 4453956 342 0 2274 108 3937 10732 2 2 78 19 0 0 1 2146252 44696 1804 4453436 345 164 1939 294 3647 7833 2 2 78 18 0 0 1 2145960 46924 1744 4451260 158 0 2423 122 4354 14597 2 2 77 18 0 7 1 2138344 44676 952 4504148 1722 403 1722 406 1388 439 87 0 10 2 0 7 2 2137248 45652 956 4499436 1384 655 1384 658 1356 392 87 0 10 3 0 7 1 2135976 46764 956 4495020 1366 718 1366 718 1395 380 87 0 9 4 0 0 8 2134484 46964 956 4489420 1673 555 1814 586 1601 215590 14 2 68 16 0 0 1 2135388 47444 972 4488516 785 833 2390 995 3812 8305 2 2 77 20 0 0 10 2135164 45928 980 4488796 788 543 2275 626 36 So, the host is swapping like crazy... top shows that it's using a lot of memory. As noted before -Xmx=4G and nothing else seems to be using a lot of memory on the host except for the cassandra process, however, of the 8G ram on the host, 92% is used by cassandra. How's that? Top shows there's 3.9g Shared and 7.2g Resident and *15.9g Virtual*. Why does it have 15g virtual? And why 7.2 RES? This can explain the slowness in swapping. $ top PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20281 cassandr 25 0 15.9g 7.2g 3.9g S 33.3 92.6 175:30.27 java So, can the total memory be controlled? Or perhaps I'm looking in the wrong direction... I've looked at all the cassandra JMX counts and nothing seemed suspicious so far. By suspicious i mean a large number of pending tasks - there were always very small numbers in each pool. About read and write latencies, I'm not sure what the normal state is, but here's an example of what I see on the problematic host: #mbean = org.apache.cassandra.service:type=StorageProxy: RecentReadLatencyMicros = 30105.888180684495; TotalReadLatencyMicros = 78543052801; TotalWriteLatencyMicros = 4213118609; RecentWriteLatencyMicros = 1444.4809201925639; ReadOperations = 4779553; RangeOperations = 0; TotalRangeLatencyMicros = 0; RecentRangeLatencyMicros = NaN; WriteOperations = 4740093; And the only pool that I do see some pending tasks is the ROW-READ-STAGE, but it doesn't look like much, usually around 6-8: #mbean = org.apache.cassandra.concurrent:type=ROW-READ-STAGE: ActiveCount = 8; PendingTasks = 8; CompletedTasks = 5427955; Any help finding the solution is appreciated, thanks... Below are a few more JMXes I collected from the system that may be interesting. #mbean = java.lang:type=Memory: Verbose = false; HeapMemoryUsage = { committed = 3767279616; init = 134217728; max = 4293656576; used = 1237105080; }; NonHeapMemoryUsage = { committed = 35061760; init = 24313856; max = 138412032; used = 23151320; }; ObjectPendingFinalizationCount = 0; #mbean = java.lang:name=ParNew,type=GarbageCollector: LastGcInfo = { GcThreadCount = 11; duration = 136; endTime = 42219272; id = 11719; memoryUsageAfterGc = { ( CMS Perm Gen ) = { key = CMS Perm Gen; value = { committed = 29229056; init = 21757952; max = 88080384; used = 17648848; }; }; ( Code Cache ) = { key = Code Cache; value = { committed = 5832704; init = 2555904; max = 50331648; used = 5563520; }; }; ( CMS Old Gen ) = { key = CMS Old Gen; value = { committed = 3594133504; init = 112459776; max = 4120510464; used = 964565720; }; }; ( Par Eden Space ) = { key = Par Eden Space; value = { committed = 171835392; init = 21495808; max = 171835392; used = 0; }; }; ( Par Survivor Space ) = { key = Par Survivor Space; value = { committed = 1310720; init = 131072; max = 1310720; used = 0; }; }; }; memoryUsageBeforeGc = { ( CMS Perm Gen ) = { key = CMS Perm Gen; value = { committed = 29229056; init = 21757952; max = 88080384; used = 17648848; }; }; ( Code Cache ) = { key = Code Cache; value = { committed = 5832704; init = 2555904; max = 50331648; used = 5563520; }; }; ( CMS Old Gen ) = { key = CMS Old Gen; value = { committed = 3594133504; init = 112459776; max = 4120510464; used = 959221872; }; }; ( Par Eden Space ) = { key = Par Eden Space; value = { committed = 171835392; init = 21495808; max = 171835392; used = 171835392; }; }; ( Par Survivor Space ) = { key = Par Survivor Space; value = { committed = 1310720; init = 131072; max = 1310720; used = 0; }; }; }; startTime = 42219136; }; CollectionCount = 11720; CollectionTime = 4561730; Name = ParNew; Valid = true; MemoryPoolNames = [ Par Eden Space, Par Survivor Space ]; #mbean = java.lang:type=OperatingSystem: MaxFileDescriptorCount = 63536; OpenFileDescriptorCount = 75; CommittedVirtualMemorySize = 17787711488; FreePhysicalMemorySize = 45522944; FreeSwapSpaceSize = 2123968512; ProcessCpuTime = 12251460000000; TotalPhysicalMemorySize = 8364417024; TotalSwapSpaceSize = 4294959104; Name = Linux; AvailableProcessors = 8; Arch = amd64; SystemLoadAverage = 4.36; Version = 2.6.18-164.15.1.el5; #mbean = java.lang:type=Runtime: Name = 20...@ob1061.nydc1.outbrain.com; ClassPath = /outbrain/cassandra/apache-cassandra-0.6.1/bin/../conf:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../build/classes:/outbrain/cassandra/apache-cassandra-0.6.1/bin/.. /lib/antlr-3.1.3.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib/apache-cassandra-0.6.1.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib/avro-1.2.0-dev.jar:/outb rain/cassandra/apache-cassandra-0.6.1/bin/../lib/clhm-production.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib/commons-cli-1.1.jar:/outbrain/cassandra/apache-cassandra- 0.6.1/bin/../lib/commons-codec-1.2.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib/commons-collections-3.2.1.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib/com mons-lang-2.4.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib/google-collections-1.0.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib/hadoop-core-0.20.1.jar:/out brain/cassandra/apache-cassandra-0.6.1/bin/../lib/high-scale-lib.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib/ivy-2.1.0.jar:/outbrain/cassandra/apache-cassandra-0.6.1/ bin/../lib/jackson-core-asl-1.4.0.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib/jackson-mapper-asl-1.4.0.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib/jline -0.9.94.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib/json-simple-1.1.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib/libthrift-r917130.jar:/outbrain/cassandr a/apache-cassandra-0.6.1/bin/../lib/log4j-1.2.14.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib/slf4j-api-1.5.8.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib /slf4j-log4j12-1.5.8.jar; BootClassPath = /usr/java/jdk1.6.0_17/jre/lib/alt-rt.jar:/usr/java/jdk1.6.0_17/jre/lib/resources.jar:/usr/java/jdk1.6.0_17/jre/lib/rt.jar:/usr/java/jdk1.6.0_17/jre/lib/sunrsasign.j ar:/usr/java/jdk1.6.0_17/jre/lib/jsse.jar:/usr/java/jdk1.6.0_17/jre/lib/jce.jar:/usr/java/jdk1.6.0_17/jre/lib/charsets.jar:/usr/java/jdk1.6.0_17/jre/classes; LibraryPath = /usr/java/jdk1.6.0_17/jre/lib/amd64/server:/usr/java/jdk1.6.0_17/jre/lib/amd64:/usr/java/jdk1.6.0_17/jre/../lib/amd64:/usr/java/packages/lib/amd64:/lib:/usr/lib; VmName = Java HotSpot(TM) 64-Bit Server VM; VmVendor = Sun Microsystems Inc.; VmVersion = 14.3-b01; BootClassPathSupported = true; InputArguments = [ -ea, -Xms128M, -Xmx4G, -XX:TargetSurvivorRatio=90, -XX:+AggressiveOpts, -XX:+UseParNewGC, -XX:+UseConcMarkSweepGC, -XX:+CMSParallelRemarkEnabled, -XX:+HeapDumpOnOutOfMemoryError, -XX:SurvivorRatio=128, -XX:MaxTenuringThreshold=0, -Dcom.sun.management.jmxremote.port=9004, -Dcom.sun.management.jmxremote.ssl=false, -Dcom.sun.management.jmxremote.authenticate=false, -Dstorage-config=/outbrain/cassandra/apache-cassandra-0.6.1/bin/../conf, -Dcassandra-pidfile=/var/run/cassandra.pid ]; ManagementSpecVersion = 1.2; SpecName = Java Virtual Machine Specification; SpecVendor = Sun Microsystems Inc.; SpecVersion = 1.0; StartTime = 1272911001415; ...