Some possibilities: You didn't adjust Cassandra heap size in cassandra.in.sh (1GB is too small) You're inserting at CL.ZERO (ROW-MUTATION-STAGE in tpstats will show large pending ops -- large = 100s) You're creating large rows a bit at a time and Cassandra OOMs when it tries to compact (the oom should usually be in the compaction thread) You have your 5 disks each with a separate data directory, which will allow up to 12 total memtables in-flight internally, and 12*256 is too much for the heap size you have (FLUSH-WRITER-STAGE in tpstats will show large pending ops -- large = more than 2 or 3)
On Tue, May 18, 2010 at 6:24 AM, Ian Soboroff <isobor...@gmail.com> wrote: > I hope this isn't too much of a newbie question. I am using Cassandra 0.6.1 > on a small cluster of Linux boxes - 14 nodes, each with 8GB RAM and 5 data > drives. The nodes are running HDFS to serve files within the cluster, but > at the moment the rest of Hadoop is shut down. I'm trying to load a large > set of web pages (the ClueWeb collection, but more is coming) and my > Cassandra daemons keep dying. > > I'm loading the pages into a simple column family that lets me fetch out > pages by an internal ID or by URL. The biggest thing in the row is the page > content, maybe 15-20k per page of raw HTML. There aren't a lot of columns. > I tried Thrift, Hector, and the BMT interface, and at the moment I'm doing > batch mutations over Thrift, about 2500 pages per batch, because that was > fastest for me in testing. > > At this point, each Cassandra node has between 500GB and 1.5TB according to > nodetool ring. Let's say I start the daemons up, and they all go live after > a couple minutes of scanning the tables. I then start my importer, which is > a single Java process reading Clueweb bundles over HDFS, cutting them up, > and sending the mutations to Cassandra. I only talk to one node at a time, > switching to a new node when I get an exception. As the job runs over a few > hours, the Cassandra daemons eventually fall over, either with no error in > the log or reporting that they are out of heap. > > Each daemon is getting 6GB of RAM and has scads of disk space to play with. > I've set the storage-conf.xml to take 256MB in a memtable before flushing > (like the BMT case), and to do batch commit log flushes, and to not have any > caching in the CFs. I'm sure I must be tuning something wrong. I would > eventually like this Cassandra setup to serve a light request load but over > say 50-100 TB of data. I'd appreciate any help or advice you can offer. > > Thanks, > Ian > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com