Agreed, and I just saw that in storage conf that a higher value for the MemtableFlushAfterMinutes is suggested otherwise you might get a "flush storm: of all your memtables flushing at once". I've changed that as well.
-- Curt, ZipZapPlay Inc., www.PlayCrafter.com, http://apps.facebook.com/happyhabitat On Mon, May 17, 2010 at 5:27 PM, Mark Greene <green...@gmail.com> wrote: > Since you only have 7.5GB of memory, it's a really bad idea to set your > heap space to a max of 7GB. Remember, the java process heap will be larger > than what Xmx is allowed to grow to. If you reach this level, you can > start swapping which is very very bad. As Brandon pointed out, you haven't > exhausted your physically memory yet but you still want to lower Xmx to > something like 5 maybe 6 GB. > > > On Mon, May 17, 2010 at 7:02 PM, Curt Bererton <c...@zipzapplay.com>wrote: > >> Here are the current jvm args and java version: >> >> # Arguments to pass to the JVM >> JVM_OPTS=" \ >> -ea \ >> -Xms128M \ >> -Xmx7G \ >> -XX:TargetSurvivorRatio=90 \ >> -XX:+AggressiveOpts \ >> -XX:+UseParNewGC \ >> -XX:+UseConcMarkSweepGC \ >> -XX:+CMSParallelRemarkEnabled \ >> -XX:+HeapDumpOnOutOfMemoryError \ >> -XX:SurvivorRatio=128 \ >> -XX:MaxTenuringThreshold=0 \ >> -Dcom.sun.management.jmxremote.port=8080 \ >> -Dcom.sun.management.jmxremote.ssl=false \ >> -Dcom.sun.management.jmxremote.authenticate=false" >> >> java -version outputs: >> java version "1.6.0_20" >> Java(TM) SE Runtime Environment (build 1.6.0_20-b02) >> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) >> >> So pretty much the defaults aside from the 7Gig max heap. CPU is totally >> hammered right now, and it is receiving 0 ops/sec from me since I >> disconnected it from our application right now until I can figure out what's >> going on. >> >> running top on the machine I get: >> top - 18:56:32 up 2 days, 20:57, 2 users, load average: 14.97, 15.24, >> 15.13 >> Tasks: 87 total, 5 running, 82 sleeping, 0 stopped, 0 zombie >> Cpu(s): 40.1%us, 33.9%sy, 0.0%ni, 0.1%id, 0.0%wa, 0.0%hi, 1.3%si, >> 24.6%st >> Mem: 7872040k total, 3618764k used, 4253276k free, 387536k buffers >> Swap: 0k total, 0k used, 0k free, 1655556k cached >> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ >> COMMAND >> 2566 cassandr 25 0 7906m 639m 10m S 150 8.3 5846:35 java >> >> >> I have jconsole up and running, and jconsole vm Summary tab says: >> - total physical memory: 7,872,040 K >> - Free physical memory: 4,253,036 K >> - Total swap space: 0K >> - Free swap space: 0K >> - Committed virtual memory: 8,096648K >> >> Is there a specific thread I can look at in jconsole that might give me a >> clue? It's weird that it's still at 100% cpu even though it's getting no >> traffic from outside right now. I suppose it might still be talking across >> the machines though. >> >> Also, stopping cassandra and starting cassandra on one of the 4 machines >> caused the CPU to go back down to almost normal levels. >> >> Here's the ring; >> >> Address Status Load >> Range Ring >> >> 170141183460469231731687303715884105728 >> 10.251.XX.XX Up 2.15 MB >> 42535295865117307932921825928971026432 |<--| >> 10.250.XX.XX Up 2.42 MB >> 85070591730234615865843651857942052864 | | >> 10.250.XX.XX Up 2.47 MB >> 127605887595351923798765477786913079296 | | >> 10.250.XX.XX Up 2.46 MB >> 170141183460469231731687303715884105728 |-->| >> >> Any thoughts? >> >> Best, >> >> Curt >> -- >> Curt, ZipZapPlay Inc., www.PlayCrafter.com, >> http://apps.facebook.com/happyhabitat >> >> >> On Mon, May 17, 2010 at 3:51 PM, Mark Greene <green...@gmail.com> wrote: >> >>> Can you provide us with the current JVM args? Also, what type of work >>> load you are giving the ring (op/s)? >>> >>> >>> On Mon, May 17, 2010 at 6:39 PM, Curt Bererton <c...@zipzapplay.com>wrote: >>> >>>> Hello Cassandra users+experts, >>>> >>>> Hopefully someone will be able to point me in the correct direction. We >>>> have cassandra 0.6.1 working on our test servers and we *thought* >>>> everything >>>> was great and ready to move to production. We are currently running a ring >>>> of 4 large instance EC2 (http://aws.amazon.com/ec2/instance-types/) >>>> servers on production with a replication factor of 3 and a QUORUM >>>> consistency level. We ran a test on 1% of our users, and everything was >>>> writing to and reading from cassandra great for the first 3 hours. After >>>> that point CPU usage spiked to 100% and stayed there, basically on all 4 >>>> machines at once. This smells to me like a GC issue, and I'm looking into >>>> it >>>> with jconsole right now. If anyone can help me debug this and get cassandra >>>> all the way up and running without CPU spiking I would be forever in their >>>> debt. >>>> >>>> I suspect that anyone else running cassandra on large EC2 instances >>>> might just be able to tell me what JVM args they are successfully using in >>>> a >>>> production environment and if they upgraded to Cassandra 0.6.2 from 0.6.1, >>>> and did they go to batched writes due to bug 1014? ( >>>> https://issues.apache.org/jira/browse/CASSANDRA-1014) That might answer >>>> all my questions. >>>> >>>> Is there anyone on the list who is using large EC2 instances in >>>> production? Would you be kind enough to share your JVM arguments and any >>>> other tips? >>>> >>>> Thanks for any help, >>>> Curt >>>> -- >>>> Curt, ZipZapPlay Inc., www.PlayCrafter.com, >>>> http://apps.facebook.com/happyhabitat >>>> >>> >>> >> >