Are you using 32 bit hosts? If not don't be scared of mmap using a lot of address space, you have plenty. It won't make you swap more than using buffered i/o.
On Tue, May 4, 2010 at 1:57 PM, Ran Tavory <ran...@gmail.com> wrote: > I canceled mmap and indeed memory usage is sane again. So far performance > hasn't been great, but I'll wait and see. > I'm also interested in a way to cap mmap so I can take advantage of it but > not swap the host to death... > > On Tue, May 4, 2010 at 9:38 PM, Kyusik Chung <kyu...@discovereads.com> > wrote: >> >> This sounds just like the slowness I was asking about in another thread - >> after a lot of reads, the machine uses up all available memory on the box >> and then starts swapping. >> My understanding was that mmap helps greatly with read and write perf >> (until the box starts swapping I guess)...is there any way to use mmap and >> cap how much memory it takes up? >> What do people use in production? mmap or no mmap? >> Thanks! >> Kyusik Chung >> On May 4, 2010, at 10:11 AM, Schubert Zhang wrote: >> >> 1. When initially startup your nodes, please plan your InitialToken of >> each node evenly. >> 2. <DiskAccessMode>standard</DiskAccessMode> >> >> On Tue, May 4, 2010 at 9:09 PM, Boris Shulman <shulm...@gmail.com> wrote: >>> >>> I think that the extra (more than 4GB) memory usage comes from the >>> mmaped io, that is why it happens only for reads. >>> >>> On Tue, May 4, 2010 at 2:02 PM, Jordan Pittier <jordan.pitt...@gmail.com> >>> wrote: >>> > I'm facing the same issue with swap. It only occurs when I perform read >>> > operations (write are very fast :)). So I can't help you with the >>> > memory >>> > probleme. >>> > >>> > But to balance the load evenly between nodes in cluster just manually >>> > fix >>> > their token.(the "formula" is i * 2^127 / nb_nodes). >>> > >>> > Jordzn >>> > >>> > On Tue, May 4, 2010 at 8:20 AM, Ran Tavory <ran...@gmail.com> wrote: >>> >> >>> >> I'm looking into performance issues on a 0.6.1 cluster. I see two >>> >> symptoms: >>> >> 1. Reads and writes are slow >>> >> 2. One of the hosts is doing a lot of GC. >>> >> 1 is slow in the sense that in normal state the cluster used to make >>> >> around 3-5k read and writes per second (6-10k operations per second), >>> >> but >>> >> how it's in the order of 200-400 ops per second, sometimes even less. >>> >> 2 looks like this: >>> >> $ tail -f /outbrain/cassandra/log/system.log >>> >> INFO [GC inspection] 2010-05-04 00:42:18,636 GCInspector.java (line >>> >> 110) >>> >> GC for ParNew: 672 ms, 166482384 reclaimed leaving 2872087208 used; >>> >> max is >>> >> 4432068608 >>> >> INFO [GC inspection] 2010-05-04 00:42:28,638 GCInspector.java (line >>> >> 110) >>> >> GC for ParNew: 498 ms, 166493352 reclaimed leaving 2836049448 used; >>> >> max is >>> >> 4432068608 >>> >> INFO [GC inspection] 2010-05-04 00:42:38,640 GCInspector.java (line >>> >> 110) >>> >> GC for ParNew: 327 ms, 166091528 reclaimed leaving 2796888424 used; >>> >> max is >>> >> 4432068608 >>> >> ... and it goes on and on for hours, no stopping... >>> >> The cluster is made of 6 hosts, 3 in one DC and 3 in another. >>> >> Each host has 8G RAM. >>> >> -Xmx=4G >>> >> For some reason, the load isn't distributed evenly b/w the hosts, >>> >> although >>> >> I'm not sure this is the cause for slowness >>> >> $ nodetool -h localhost -p 9004 ring >>> >> Address Status Load Range >>> >> Ring >>> >> >>> >> 144413773383729447702215082383444206680 >>> >> 192.168.252.99Up 15.94 GB >>> >> 66002764663998929243644931915471302076 |<--| >>> >> 192.168.254.57Up 19.84 GB >>> >> 81288739225600737067856268063987022738 | ^ >>> >> 192.168.254.58Up 973.78 MB >>> >> 86999744104066390588161689990810839743 v | >>> >> 192.168.252.62Up 5.18 GB >>> >> 88308919879653155454332084719458267849 | ^ >>> >> 192.168.254.59Up 10.57 GB >>> >> 142482163220375328195837946953175033937 v | >>> >> 192.168.252.61Up 11.36 GB >>> >> 144413773383729447702215082383444206680 |-->| >>> >> The slow host is 192.168.252.61 and it isn't the most loaded one. >>> >> The host is waiting a lot on IO and the load average is usually 6-7 >>> >> $ w >>> >> 00:42:56 up 11 days, 13:22, 1 user, load average: 6.21, 5.52, 3.93 >>> >> $ vmstat 5 >>> >> procs -----------memory---------- ---swap-- -----io---- --system-- >>> >> -----cpu------ >>> >> r b swpd free buff cache si so bi bo in cs us >>> >> sy id >>> >> wa st >>> >> 0 8 2147844 45744 1816 4457384 6 5 66 32 5 2 1 >>> >> 1 >>> >> 96 2 0 >>> >> 0 8 2147164 49020 1808 4451596 385 0 2345 58 3372 9957 2 >>> >> 2 >>> >> 78 18 0 >>> >> 0 3 2146432 45704 1812 4453956 342 0 2274 108 3937 10732 >>> >> 2 2 >>> >> 78 19 0 >>> >> 0 1 2146252 44696 1804 4453436 345 164 1939 294 3647 7833 2 >>> >> 2 >>> >> 78 18 0 >>> >> 0 1 2145960 46924 1744 4451260 158 0 2423 122 4354 14597 >>> >> 2 2 >>> >> 77 18 0 >>> >> 7 1 2138344 44676 952 4504148 1722 403 1722 406 1388 439 87 >>> >> 0 >>> >> 10 2 0 >>> >> 7 2 2137248 45652 956 4499436 1384 655 1384 658 1356 392 87 >>> >> 0 >>> >> 10 3 0 >>> >> 7 1 2135976 46764 956 4495020 1366 718 1366 718 1395 380 87 >>> >> 0 >>> >> 9 4 0 >>> >> 0 8 2134484 46964 956 4489420 1673 555 1814 586 1601 215590 >>> >> 14 >>> >> 2 68 16 0 >>> >> 0 1 2135388 47444 972 4488516 785 833 2390 995 3812 8305 2 >>> >> 2 >>> >> 77 20 0 >>> >> 0 10 2135164 45928 980 4488796 788 543 2275 626 36 >>> >> So, the host is swapping like crazy... >>> >> top shows that it's using a lot of memory. As noted before -Xmx=4G and >>> >> nothing else seems to be using a lot of memory on the host except for >>> >> the >>> >> cassandra process, however, of the 8G ram on the host, 92% is used by >>> >> cassandra. How's that? >>> >> Top shows there's 3.9g Shared and 7.2g Resident and 15.9g Virtual. Why >>> >> does it have 15g virtual? And why 7.2 RES? This can explain the >>> >> slowness in >>> >> swapping. >>> >> $ top >>> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>> >> >>> >> >>> >> 20281 cassandr 25 0 15.9g 7.2g 3.9g S 33.3 92.6 175:30.27 java >>> >> So, can the total memory be controlled? >>> >> Or perhaps I'm looking in the wrong direction... >>> >> I've looked at all the cassandra JMX counts and nothing seemed >>> >> suspicious >>> >> so far. By suspicious i mean a large number of pending tasks - there >>> >> were >>> >> always very small numbers in each pool. >>> >> About read and write latencies, I'm not sure what the normal state is, >>> >> but >>> >> here's an example of what I see on the problematic host: >>> >> #mbean = org.apache.cassandra.service:type=StorageProxy: >>> >> RecentReadLatencyMicros = 30105.888180684495; >>> >> TotalReadLatencyMicros = 78543052801; >>> >> TotalWriteLatencyMicros = 4213118609; >>> >> RecentWriteLatencyMicros = 1444.4809201925639; >>> >> ReadOperations = 4779553; >>> >> RangeOperations = 0; >>> >> TotalRangeLatencyMicros = 0; >>> >> RecentRangeLatencyMicros = NaN; >>> >> WriteOperations = 4740093; >>> >> And the only pool that I do see some pending tasks is the >>> >> ROW-READ-STAGE, >>> >> but it doesn't look like much, usually around 6-8: >>> >> #mbean = org.apache.cassandra.concurrent:type=ROW-READ-STAGE: >>> >> ActiveCount = 8; >>> >> PendingTasks = 8; >>> >> CompletedTasks = 5427955; >>> >> Any help finding the solution is appreciated, thanks... >>> >> Below are a few more JMXes I collected from the system that may be >>> >> interesting. >>> >> #mbean = java.lang:type=Memory: >>> >> Verbose = false; >>> >> HeapMemoryUsage = { >>> >> committed = 3767279616; >>> >> init = 134217728; >>> >> max = 4293656576; >>> >> used = 1237105080; >>> >> }; >>> >> NonHeapMemoryUsage = { >>> >> committed = 35061760; >>> >> init = 24313856; >>> >> max = 138412032; >>> >> used = 23151320; >>> >> }; >>> >> ObjectPendingFinalizationCount = 0; >>> >> #mbean = java.lang:name=ParNew,type=GarbageCollector: >>> >> LastGcInfo = { >>> >> GcThreadCount = 11; >>> >> duration = 136; >>> >> endTime = 42219272; >>> >> id = 11719; >>> >> memoryUsageAfterGc = { >>> >> ( CMS Perm Gen ) = { >>> >> key = CMS Perm Gen; >>> >> value = { >>> >> committed = 29229056; >>> >> init = 21757952; >>> >> max = 88080384; >>> >> used = 17648848; >>> >> }; >>> >> }; >>> >> ( Code Cache ) = { >>> >> key = Code Cache; >>> >> value = { >>> >> committed = 5832704; >>> >> init = 2555904; >>> >> max = 50331648; >>> >> used = 5563520; >>> >> }; >>> >> }; >>> >> ( CMS Old Gen ) = { >>> >> key = CMS Old Gen; >>> >> value = { >>> >> committed = 3594133504; >>> >> init = 112459776; >>> >> max = 4120510464; >>> >> used = 964565720; >>> >> }; >>> >> }; >>> >> ( Par Eden Space ) = { >>> >> key = Par Eden Space; >>> >> value = { >>> >> committed = 171835392; >>> >> init = 21495808; >>> >> max = 171835392; >>> >> used = 0; >>> >> }; >>> >> }; >>> >> ( Par Survivor Space ) = { >>> >> key = Par Survivor Space; >>> >> value = { >>> >> committed = 1310720; >>> >> init = 131072; >>> >> max = 1310720; >>> >> used = 0; >>> >> }; >>> >> }; >>> >> }; >>> >> memoryUsageBeforeGc = { >>> >> ( CMS Perm Gen ) = { >>> >> key = CMS Perm Gen; >>> >> value = { >>> >> committed = 29229056; >>> >> init = 21757952; >>> >> max = 88080384; >>> >> used = 17648848; >>> >> }; >>> >> }; >>> >> ( Code Cache ) = { >>> >> key = Code Cache; >>> >> value = { >>> >> committed = 5832704; >>> >> init = 2555904; >>> >> max = 50331648; >>> >> used = 5563520; >>> >> }; >>> >> }; >>> >> ( CMS Old Gen ) = { >>> >> key = CMS Old Gen; >>> >> value = { >>> >> committed = 3594133504; >>> >> init = 112459776; >>> >> max = 4120510464; >>> >> used = 959221872; >>> >> }; >>> >> }; >>> >> ( Par Eden Space ) = { >>> >> key = Par Eden Space; >>> >> value = { >>> >> committed = 171835392; >>> >> init = 21495808; >>> >> max = 171835392; >>> >> used = 171835392; >>> >> }; >>> >> }; >>> >> ( Par Survivor Space ) = { >>> >> key = Par Survivor Space; >>> >> value = { >>> >> committed = 1310720; >>> >> init = 131072; >>> >> max = 1310720; >>> >> used = 0; >>> >> }; >>> >> }; >>> >> }; >>> >> startTime = 42219136; >>> >> }; >>> >> CollectionCount = 11720; >>> >> CollectionTime = 4561730; >>> >> Name = ParNew; >>> >> Valid = true; >>> >> MemoryPoolNames = [ Par Eden Space, Par Survivor Space ]; >>> >> #mbean = java.lang:type=OperatingSystem: >>> >> MaxFileDescriptorCount = 63536; >>> >> OpenFileDescriptorCount = 75; >>> >> CommittedVirtualMemorySize = 17787711488; >>> >> FreePhysicalMemorySize = 45522944; >>> >> FreeSwapSpaceSize = 2123968512; >>> >> ProcessCpuTime = 12251460000000; >>> >> TotalPhysicalMemorySize = 8364417024; >>> >> TotalSwapSpaceSize = 4294959104; >>> >> Name = Linux; >>> >> AvailableProcessors = 8; >>> >> Arch = amd64; >>> >> SystemLoadAverage = 4.36; >>> >> Version = 2.6.18-164.15.1.el5; >>> >> #mbean = java.lang:type=Runtime: >>> >> Name = 20...@ob1061.nydc1.outbrain.com; >>> >> >>> >> ClassPath = >>> >> >>> >> /outbrain/cassandra/apache-cassandra-0.6.1/bin/../conf:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../build/classes:/outbrain/cassandra/apache-cassandra-0.6.1/bin/.. >>> >> >>> >> >>> >> /lib/antlr-3.1.3.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib/apache-cassandra-0.6.1.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib/avro-1.2.0-dev.jar:/outb >>> >> >>> >> >>> >> rain/cassandra/apache-cassandra-0.6.1/bin/../lib/clhm-production.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib/commons-cli-1.1.jar:/outbrain/cassandra/apache-cassandra- >>> >> >>> >> >>> >> 0.6.1/bin/../lib/commons-codec-1.2.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib/commons-collections-3.2.1.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib/com >>> >> >>> >> >>> >> mons-lang-2.4.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib/google-collections-1.0.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib/hadoop-core-0.20.1.jar:/out >>> >> >>> >> >>> >> brain/cassandra/apache-cassandra-0.6.1/bin/../lib/high-scale-lib.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib/ivy-2.1.0.jar:/outbrain/cassandra/apache-cassandra-0.6.1/ >>> >> >>> >> >>> >> bin/../lib/jackson-core-asl-1.4.0.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib/jackson-mapper-asl-1.4.0.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib/jline >>> >> >>> >> >>> >> -0.9.94.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib/json-simple-1.1.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib/libthrift-r917130.jar:/outbrain/cassandr >>> >> >>> >> >>> >> a/apache-cassandra-0.6.1/bin/../lib/log4j-1.2.14.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib/slf4j-api-1.5.8.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib >>> >> /slf4j-log4j12-1.5.8.jar; >>> >> >>> >> BootClassPath = >>> >> >>> >> /usr/java/jdk1.6.0_17/jre/lib/alt-rt.jar:/usr/java/jdk1.6.0_17/jre/lib/resources.jar:/usr/java/jdk1.6.0_17/jre/lib/rt.jar:/usr/java/jdk1.6.0_17/jre/lib/sunrsasign.j >>> >> >>> >> >>> >> ar:/usr/java/jdk1.6.0_17/jre/lib/jsse.jar:/usr/java/jdk1.6.0_17/jre/lib/jce.jar:/usr/java/jdk1.6.0_17/jre/lib/charsets.jar:/usr/java/jdk1.6.0_17/jre/classes; >>> >> >>> >> LibraryPath = >>> >> >>> >> /usr/java/jdk1.6.0_17/jre/lib/amd64/server:/usr/java/jdk1.6.0_17/jre/lib/amd64:/usr/java/jdk1.6.0_17/jre/../lib/amd64:/usr/java/packages/lib/amd64:/lib:/usr/lib; >>> >> >>> >> VmName = Java HotSpot(TM) 64-Bit Server VM; >>> >> >>> >> VmVendor = Sun Microsystems Inc.; >>> >> >>> >> VmVersion = 14.3-b01; >>> >> >>> >> BootClassPathSupported = true; >>> >> >>> >> InputArguments = [ -ea, -Xms128M, -Xmx4G, -XX:TargetSurvivorRatio=90, >>> >> -XX:+AggressiveOpts, -XX:+UseParNewGC, -XX:+UseConcMarkSweepGC, >>> >> -XX:+CMSParallelRemarkEnabled, -XX:+HeapDumpOnOutOfMemoryError, >>> >> -XX:SurvivorRatio=128, -XX:MaxTenuringThreshold=0, >>> >> -Dcom.sun.management.jmxremote.port=9004, >>> >> -Dcom.sun.management.jmxremote.ssl=false, >>> >> -Dcom.sun.management.jmxremote.authenticate=false, >>> >> >>> >> -Dstorage-config=/outbrain/cassandra/apache-cassandra-0.6.1/bin/../conf, >>> >> -Dcassandra-pidfile=/var/run/cassandra.pid ]; >>> >> >>> >> ManagementSpecVersion = 1.2; >>> >> >>> >> SpecName = Java Virtual Machine Specification; >>> >> >>> >> SpecVendor = Sun Microsystems Inc.; >>> >> >>> >> SpecVersion = 1.0; >>> >> >>> >> StartTime = 1272911001415; >>> >> ... >>> > >> >> > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com