It looks like MAX_HEAP_SIZE is set in cassandra-env.sh to be half of my physical memory. These are 15GB VMs, so that's 7.5GB for Cassandra. I would have expected that to work, but I will override to 13 GB just to see what happens.
I've also got the JNA thing set up. Do you think this would cause the crashes, or is it just a performance improvement? On May 12, 2011, at 7:27 PM, Sameer Farooqui wrote: > The key JVM options for Cassandra are in cassandra.in.sh. > > What is your min and max heap size? > > The default setting of max heap size is 1GB. How much RAM do your nodes have? > You may want to increase this setting. You can also set the -Xmx and -Xms > options to the same value to keep Java from having to manage heap growth. On > a 32-bit machine, you can get a max of about 1.6 GB of heap; you can get a > lot more on 64-bit. > > Try messing with some of the other settings in the cassandra.in.sh file. > > You may not have DEBUG mode turned on for Cassandra and therefore may not be > getting the full details of what's going on when the server crashes. In the > <cassandra-home>/conf/log4j-server.properties file, set this line from the > default of INFO to DEBUG: > > log4j.rootLogger=INFO,stdout,R > > > Also, you haven't configured JNA on this server. Here's some info about it > and how to configure it: > > JNA provides Java programs easy access to native shared libraries without > writing anything but Java code. > > Note from Cassandra developers for why JNA is needed: > "Linux aggressively swaps out infrequently used memory to make more room for > its file system buffer cache. Unfortunately, modern generational garbage > collectors like the JVM's leave parts of its heap un-touched for relatively > large amounts of time, leading Linux to swap it out. When the JVM finally > goes to use or GC that memory, swap hell ensues. > > Setting swappiness to zero can mitigate this behavior but does not eliminate > it entirely. Turning off swap entirely is effective. But to avoid surprising > people who don't know about this behavior, the best solution is to tell Linux > not to swap out the JVM, and that is what we do now with mlockall via JNA. > > Because of licensing issues, we can't distribute JNA with Cassandra, so you > must manually add it to the Cassandra lib/ directory or otherwise place it on > the classpath. If the JNA jar is not present, Cassandra will continue as > before." > > Get JNA with: > cd ~ > wget http://debian.riptano.com/debian/pool/libjna-java_3.2.7-0~nmu.2_amd64.deb > > To install: > techlabs@cassandraN1:~$ sudo dpkg -i libjna-java_3.2.7-0~nmu.2_amd64.deb > (Reading database ... 44334 files and directories currently installed.) > Preparing to replace libjna-java 3.2.4-2 (using > libjna-java_3.2.7-0~nmu.2_amd64.deb) ... > Unpacking replacement libjna-java ... > Setting up libjna-java (3.2.7-0~nmu.2) ... > > > The deb package will install the JNA jar file to /usr/share/java/jna.jar, but > Cassandra only loads it if its in the class path. The easy way to do this is > just create a symlink into your Cassandra lib directory (note: replace > /home/techlabs with your home dir location): > ln -s /usr/share/java/jna.jar /home/techlabs/apache-cassandra-0.7.0/lib > > Research: > http://journal.paul.querna.org/articles/2010/11/11/enabling-jna-in-cassandra/ > > > - Sameer > > > On Thu, May 12, 2011 at 4:15 PM, James Cipar <jci...@cmu.edu> wrote: > I'm using Cassandra 0.7.5, and uploading about 200 GB of data total (20 GB > unique data), to a cluster of 10 servers. I'm using batch_mutate, and > breaking the data up into chunks of about 10k records. Each record is about > 5KB, so a total of about 50MB per batch. When I upload a smaller 2 GB data > set, everything works fine. When I upload the 20 GB data set, servers will > occasionally crash. Currently I have my client code automatically detect > this and restart the server, but that is less than ideal. > > I'm not sure what information to gather to determine what's going on here. > Here is a sample of a log file from when a crash occurred. The crash was > immediately after the log entry tagged "2011-05-12 19:02:19,377". Any idea > what's going on here? Any other info I can gather to try to debug this? > > > > > > > > INFO [ScheduledTasks:1] 2011-05-12 19:02:07,855 GCInspector.java (line 128) > GC for ParNew: 375 ms, 576641232 reclaimed leaving 5471432144 used; max is > 7774142464 > INFO [ScheduledTasks:1] 2011-05-12 19:02:08,857 GCInspector.java (line 128) > GC for ParNew: 450 ms, -63738232 reclaimed leaving 5546942544 used; max is > 7774142464 > INFO [COMMIT-LOG-WRITER] 2011-05-12 19:02:10,652 CommitLogSegment.java (line > 50) Creating new commitlog segment > /mnt/scratch/jcipar/cassandra/commitlog/CommitLog-1305241330652.log > INFO [MutationStage:24] 2011-05-12 19:02:10,680 ColumnFamilyStore.java (line > 1070) Enqueuing flush of Memtable-Standard1@1256245282(51921529 bytes, > 1115783 operations) > INFO [FlushWriter:1] 2011-05-12 19:02:10,680 Memtable.java (line 158) > Writing Memtable-Standard1@1256245282(51921529 bytes, 1115783 operations) > INFO [ScheduledTasks:1] 2011-05-12 19:02:12,932 GCInspector.java (line 128) > GC for ParNew: 249 ms, 571827736 reclaimed leaving 3165899760 used; max is > 7774142464 > INFO [ScheduledTasks:1] 2011-05-12 19:02:15,253 GCInspector.java (line 128) > GC for ParNew: 341 ms, 561823592 reclaimed leaving 1764208800 used; max is > 7774142464 > INFO [FlushWriter:1] 2011-05-12 19:02:16,743 Memtable.java (line 165) > Completed flushing > /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-74-Data.db (53646223 > bytes) > INFO [COMMIT-LOG-WRITER] 2011-05-12 19:02:16,745 CommitLog.java (line 440) > Discarding obsolete commit > log:CommitLogSegment(/mnt/scratch/jcipar/cassandra/commitlog/CommitLog-1305241306438.log) > INFO [ScheduledTasks:1] 2011-05-12 19:02:18,256 GCInspector.java (line 128) > GC for ParNew: 305 ms, 544491840 reclaimed leaving 865198712 used; max is > 7774142464 > INFO [MutationStage:19] 2011-05-12 19:02:19,000 ColumnFamilyStore.java (line > 1070) Enqueuing flush of Memtable-Standard1@479849353(51941121 bytes, 1115783 > operations) > INFO [FlushWriter:1] 2011-05-12 19:02:19,000 Memtable.java (line 158) > Writing Memtable-Standard1@479849353(51941121 bytes, 1115783 operations) > INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,310 SSTable.java (line 147) > Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-51 > INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,324 SSTable.java (line 147) > Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-55 > INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,339 SSTable.java (line 147) > Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-58 > INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,357 SSTable.java (line 147) > Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-67 > INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,377 SSTable.java (line 147) > Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-61 > INFO [main] 2011-05-12 19:02:21,026 AbstractCassandraDaemon.java (line 78) > Logging initialized > INFO [main] 2011-05-12 19:02:21,040 AbstractCassandraDaemon.java (line 96) > Heap size: 7634681856/7635730432 > INFO [main] 2011-05-12 19:02:21,042 CLibrary.java (line 61) JNA not found. > Native methods will be disabled. > INFO [main] 2011-05-12 19:02:21,052 DatabaseDescriptor.java (line 121) > Loading settings from > file:/h/jcipar/Projects/HP/OtherDBs/Cassandra/apache-cassandra-0.7.5/conf/cassandra.yaml > INFO [main] 2011-05-12 19:02:21,178 DatabaseDescriptor.java (line 181) > DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap > INFO [main] 2011-05-12 19:02:21,310 SSTableReader.java (line 154) Opening > /mnt/scratch/jcipar/cassandra/data/system/Schema-f-1 > INFO [main] 2011-05-12 19:02:21,327 SSTableReader.java (line 154) Opening > /mnt/scratch/jcipar/cassandra/data/system/Schema-f-2 > INFO [main] 2011-05-12 19:02:21,336 SSTableReader.java (line 154) Opening > /mnt/scratch/jcipar/cassandra/data/system/Migrations-f-1 > INFO [main] 2011-05-12 19:02:21,337 SSTableReader.java (line 154) Opening > /mnt/scratch/jcipar/cassandra/data/system/Migrations-f-2 > INFO [main] 2011-05-12 19:02:21,342 SSTableReader.java (line 154) Opening > /mnt/scratch/jcipar/cassandra/data/system/LocationInfo-f-2 > INFO [main] 2011-05-12 19:02:21,344 SSTableReader.java (line 154) Opening > /mnt/scratch/jcipar/cassandra/data/system/LocationInfo-f-1 > INFO [main] 2011-05-12 19:02:21,379 DatabaseDescriptor.java (line 461) > Loading schema version 9467ffe0-7cea-11e0-8ddc-f74ef74e382f >