Yes. It¹s kind of an unusual workload. An insertion phase followed by a deletion phase, generally not overlapping.
From: Benedict Elliott Smith <belliottsm...@datastax.com> Reply-To: <user@cassandra.apache.org> Date: Tuesday, February 4, 2014 at 5:29 PM To: <user@cassandra.apache.org> Subject: Re: Lots of deletions results in death by GC Is it possible you are generating exclusively deletes for this table? On 5 February 2014 00:10, Robert Wille <rwi...@fold3.com> wrote: > I ran my test again, and Flush Writer¹s ³All time blocked² increased to 2 and > then shortly thereafter GC went into its death spiral. I doubled > memtable_flush_writers (to 2) and memtable_flush_queue_size (to 8) and tried > again. > > This time, the table that always sat with Memtable data size = 0 now showed > increases in Memtable data size. That was encouraging. It never flushed, which > isn¹t too surprising, because that table has relatively few rows and they are > pretty wide. However, on the fourth table to clean, Flush Writer¹s ³All time > blocked² went to 1, and then there were no more completed events, and about 10 > minutes later GC went into its death spiral. I assume that each time Flush > Writer completes an event, that means a table was flushed. Is that right? > Also, I got two dropped mutation messages at the same time that Flush Writer¹s > All time blocked incremented. > > I then increased the writers and queue size to 3 and 12, respectively, and ran > my test again. This time All time blocked remained at 0, but I still suffered > death by GC. > > I would almost think that this is caused by high load on the server, but I¹ve > never seen CPU utilization go above about two of my eight available cores. If > high load triggers this problem, then that is very disconcerting. That means > that a CPU spike could permanently cripple a node. Okay, not permanently, but > until a manual flush occurs. > > If anyone has any further thoughts, I¹d love to hear them. I¹m quite at the > end of my rope. > > Thanks in advance > > Robert > > From: Nate McCall <n...@thelastpickle.com> > Reply-To: <user@cassandra.apache.org> > Date: Saturday, February 1, 2014 at 9:25 AM > To: Cassandra Users <user@cassandra.apache.org> > Subject: Re: Lots of deletions results in death by GC > > What's the output of 'nodetool tpstats' while this is happening? Specifically > is Flush Writer "All time blocked" increasing? If so, play around with turning > up memtable_flush_writers and memtable_flush_queue_size and see if that helps. > > > On Sat, Feb 1, 2014 at 9:03 AM, Robert Wille <rwi...@fold3.com> wrote: >> A few days ago I posted about an issue I¹m having where GC takes a long time >> (20-30 seconds), and it happens repeatedly and basically no work gets done. >> I¹ve done further investigation, and I now believe that I know the cause. If >> I do a lot of deletes, it creates memory pressure until the memtables are >> flushed, but Cassandra doesn¹t flush them. If I manually flush, then life is >> good again (although that takes a very long time because of the GC issue). If >> I just leave the flushing to Cassandra, then I end up with death by GC. I >> believe that when the memtables are full of tombstones, Cassadnra doesn¹t >> realize how much memory the memtables are actually taking up, and so it >> doesn¹t proactively flush them in order to free up heap. >> >> As I was deleting records out of one of my tables, I was watching it via >> nodetool cfstats, and I found a very curious thing: >> >> Memtable cell count: 1285 >> Memtable data size, bytes: 0 >> Memtable switch count: 56 >> >> As the deletion process was chugging away, the memtable cell count increased, >> as expected, but the data size stayed at 0. No flushing occurred. >> >> Here¹s the schema for this table: >> >> CREATE TABLE bdn_index_pub ( >> >> tshard VARCHAR, >> >> pord INT, >> >> ord INT, >> >> hpath VARCHAR, >> >> page BIGINT, >> >> PRIMARY KEY (tshard, pord) >> >> ) WITH gc_grace_seconds = 0 AND compaction = { 'class' : >> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 }; >> >> >> I have a few tables that I run this cleaning process on, and not all of them >> exhibit this behavior. One of them reported an increasing number of bytes, as >> expected, and it also flushed as expected. Here¹s the schema for that table: >> >> >> CREATE TABLE bdn_index_child ( >> >> ptshard VARCHAR, >> >> ord INT, >> >> hpath VARCHAR, >> >> PRIMARY KEY (ptshard, ord) >> >> ) WITH gc_grace_seconds = 0 AND compaction = { 'class' : >> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 }; >> >> >> In both cases, I¹m deleting the entire record (i.e. specifying just the first >> component of the primary key in the delete statement). Most records in >> bdn_index_pub have 10,000 rows per record. bdn_index_child usually has just a >> handful of rows, but a few records can have up 10,000. >> >> Still a further mystery, 1285 tombstones in the bdn_index_pub memtable >> doesn¹t seem like nearly enough to create a memory problem. Perhaps there are >> other flaws in the memory metering. Or perhaps there is some other issue that >> causes Cassandra to mismanage the heap when there are a lot of deletes. One >> other thought I had is that I page through these tables and clean them out as >> I go. Perhaps there is some interaction between the paging and the deleting >> that causes the GC problems and I should create a list of keys to delete and >> then delete them after I¹ve finished reading the entire table. >> >> I reduced memtable_total_space_in_mb from the default (probably 2.7 GB) to 1 >> GB, in hopes that it would force Cassandra to flush tables before I ran into >> death by GC, but it didn¹t seem to help. >> >> I¹m using Cassandra 2.0.4. >> >> Any insights would be greatly appreciated. I can¹t be the only one that has >> periodic delete-heavy workloads. Hopefully someone else has run into this and >> can give advice. >> >> Thanks >> >> Robert > > > > -- > ----------------- > Nate McCall > Austin, TX > @zznate > > Co-Founder & Sr. Technical Consultant > Apache Cassandra Consulting > http://www.thelastpickle.com