Also, I am using batch_mutate for all of my writes. Lee Parker On Mon, May 17, 2010 at 7:11 PM, Lee Parker <l...@socialagency.com> wrote:
> What are your storage-conf settings for Memtable thresholds? One thing > that could cause lots of CPU usage is dumping the memtables too frequently > and then having to do lots of compaction. With that much available heap > space you could definitely go larger than the default thresholds. Also, do > you not have any swap space setup on the machine? It is a good idea to at > least setup a swap file so that the system can use it when it needs to. > > We are running a two node cluster using Amazon large EC2 instances as well. > The cluster is using a replication factor of 2 and most of my writes and > reads are at a consistency level of ONE except for a few QUORUM calls. The > only difference in my JVM opts is that my max is set at 6G. I have the two > ephemeral disks setup as a raid 0 array and that is where I'm storing the > data. The commit logs are going to the default location so they are using > the local disk. We currently have more than 90G of data running on these > and have only had issues with CPU utilization when our code was accidentally > duplicating content to one of the servers. This duplication of content > started causing the server to be in a state of constant major compaction and > it couldn't keep up with new writes. In the end, I completely dropped that > server and spun up another one to take it's place since the one good server > had all the data anyway. So, it might have also been an issue with that > box. > > One more question, are all of the instances in the same region? > > Lee Parker > On Mon, May 17, 2010 at 6:02 PM, Curt Bererton <c...@zipzapplay.com>wrote: > >> Here are the current jvm args and java version: >> >> # Arguments to pass to the JVM >> JVM_OPTS=" \ >> -ea \ >> -Xms128M \ >> -Xmx7G \ >> -XX:TargetSurvivorRatio=90 \ >> -XX:+AggressiveOpts \ >> -XX:+UseParNewGC \ >> -XX:+UseConcMarkSweepGC \ >> -XX:+CMSParallelRemarkEnabled \ >> -XX:+HeapDumpOnOutOfMemoryError \ >> -XX:SurvivorRatio=128 \ >> -XX:MaxTenuringThreshold=0 \ >> -Dcom.sun.management.jmxremote.port=8080 \ >> -Dcom.sun.management.jmxremote.ssl=false \ >> -Dcom.sun.management.jmxremote.authenticate=false" >> >> java -version outputs: >> java version "1.6.0_20" >> Java(TM) SE Runtime Environment (build 1.6.0_20-b02) >> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) >> >> So pretty much the defaults aside from the 7Gig max heap. CPU is totally >> hammered right now, and it is receiving 0 ops/sec from me since I >> disconnected it from our application right now until I can figure out what's >> going on. >> >> running top on the machine I get: >> top - 18:56:32 up 2 days, 20:57, 2 users, load average: 14.97, 15.24, >> 15.13 >> Tasks: 87 total, 5 running, 82 sleeping, 0 stopped, 0 zombie >> Cpu(s): 40.1%us, 33.9%sy, 0.0%ni, 0.1%id, 0.0%wa, 0.0%hi, 1.3%si, >> 24.6%st >> Mem: 7872040k total, 3618764k used, 4253276k free, 387536k buffers >> Swap: 0k total, 0k used, 0k free, 1655556k cached >> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ >> COMMAND >> 2566 cassandr 25 0 7906m 639m 10m S 150 8.3 5846:35 java >> >> >> I have jconsole up and running, and jconsole vm Summary tab says: >> - total physical memory: 7,872,040 K >> - Free physical memory: 4,253,036 K >> - Total swap space: 0K >> - Free swap space: 0K >> - Committed virtual memory: 8,096648K >> >> Is there a specific thread I can look at in jconsole that might give me a >> clue? It's weird that it's still at 100% cpu even though it's getting no >> traffic from outside right now. I suppose it might still be talking across >> the machines though. >> >> Also, stopping cassandra and starting cassandra on one of the 4 machines >> caused the CPU to go back down to almost normal levels. >> >> Here's the ring; >> Address Status Load >> Range Ring >> >> 170141183460469231731687303715884105728 >> 10.251.XX.XX Up 2.15 MB >> 42535295865117307932921825928971026432 |<--| >> 10.250.XX.XX Up 2.42 MB >> 85070591730234615865843651857942052864 | | >> 10.250.XX.XX Up 2.47 MB >> 127605887595351923798765477786913079296 | | >> 10.250.XX.XX Up 2.46 MB >> 170141183460469231731687303715884105728 |-->| >> >> Any thoughts? >> >> Best, >> >> Curt >> -- >> Curt, ZipZapPlay Inc., www.PlayCrafter.com, >> http://apps.facebook.com/happyhabitat >> >> >> On Mon, May 17, 2010 at 3:51 PM, Mark Greene <green...@gmail.com> wrote: >> >>> Can you provide us with the current JVM args? Also, what type of work >>> load you are giving the ring (op/s)? >>> >>> >>> On Mon, May 17, 2010 at 6:39 PM, Curt Bererton <c...@zipzapplay.com>wrote: >>> >>>> Hello Cassandra users+experts, >>>> >>>> Hopefully someone will be able to point me in the correct direction. We >>>> have cassandra 0.6.1 working on our test servers and we *thought* >>>> everything >>>> was great and ready to move to production. We are currently running a ring >>>> of 4 large instance EC2 (http://aws.amazon.com/ec2/instance-types/) >>>> servers on production with a replication factor of 3 and a QUORUM >>>> consistency level. We ran a test on 1% of our users, and everything was >>>> writing to and reading from cassandra great for the first 3 hours. After >>>> that point CPU usage spiked to 100% and stayed there, basically on all 4 >>>> machines at once. This smells to me like a GC issue, and I'm looking into >>>> it >>>> with jconsole right now. If anyone can help me debug this and get cassandra >>>> all the way up and running without CPU spiking I would be forever in their >>>> debt. >>>> >>>> I suspect that anyone else running cassandra on large EC2 instances >>>> might just be able to tell me what JVM args they are successfully using in >>>> a >>>> production environment and if they upgraded to Cassandra 0.6.2 from 0.6.1, >>>> and did they go to batched writes due to bug 1014? ( >>>> https://issues.apache.org/jira/browse/CASSANDRA-1014) That might answer >>>> all my questions. >>>> >>>> Is there anyone on the list who is using large EC2 instances in >>>> production? Would you be kind enough to share your JVM arguments and any >>>> other tips? >>>> >>>> Thanks for any help, >>>> Curt >>>> -- >>>> Curt, ZipZapPlay Inc., www.PlayCrafter.com, >>>> http://apps.facebook.com/happyhabitat >>>> >>> >>> >> >