I believe there is a bug, and I have filed a ticket for it: https://issues.apache.org/jira/browse/CASSANDRA-6655
I will have a patch uploaded shortly, but it's just missed the 2.0.5 release window, so you'll either need to grab the development branch once it's committed or wait until 2.0.6 On 5 February 2014 15:09, Robert Wille <rwi...@fold3.com> wrote: > Yes. It's kind of an unusual workload. An insertion phase followed by a > deletion phase, generally not overlapping. > > From: Benedict Elliott Smith <belliottsm...@datastax.com> > Reply-To: <user@cassandra.apache.org> > Date: Tuesday, February 4, 2014 at 5:29 PM > To: <user@cassandra.apache.org> > > Subject: Re: Lots of deletions results in death by GC > > Is it possible you are generating *exclusively* deletes for this table? > > > On 5 February 2014 00:10, Robert Wille <rwi...@fold3.com> wrote: > >> I ran my test again, and Flush Writer's "All time blocked" increased to 2 >> and then shortly thereafter GC went into its death spiral. I doubled >> memtable_flush_writers (to 2) and memtable_flush_queue_size (to 8) and >> tried again. >> >> This time, the table that always sat with Memtable data size = 0 now >> showed increases in Memtable data size. That was encouraging. It never >> flushed, which isn't too surprising, because that table has relatively few >> rows and they are pretty wide. However, on the fourth table to clean, Flush >> Writer's "All time blocked" went to 1, and then there were no more >> completed events, and about 10 minutes later GC went into its death spiral. >> I assume that each time Flush Writer completes an event, that means a table >> was flushed. Is that right? Also, I got two dropped mutation messages at >> the same time that Flush Writer's All time blocked incremented. >> >> I then increased the writers and queue size to 3 and 12, respectively, >> and ran my test again. This time All time blocked remained at 0, but I >> still suffered death by GC. >> >> I would almost think that this is caused by high load on the server, but >> I've never seen CPU utilization go above about two of my eight available >> cores. If high load triggers this problem, then that is very disconcerting. >> That means that a CPU spike could permanently cripple a node. Okay, not >> permanently, but until a manual flush occurs. >> >> If anyone has any further thoughts, I'd love to hear them. I'm quite at >> the end of my rope. >> >> Thanks in advance >> >> Robert >> >> From: Nate McCall <n...@thelastpickle.com> >> Reply-To: <user@cassandra.apache.org> >> Date: Saturday, February 1, 2014 at 9:25 AM >> To: Cassandra Users <user@cassandra.apache.org> >> Subject: Re: Lots of deletions results in death by GC >> >> What's the output of 'nodetool tpstats' while this is happening? >> Specifically is Flush Writer "All time blocked" increasing? If so, play >> around with turning up memtable_flush_writers and memtable_flush_queue_size >> and see if that helps. >> >> >> On Sat, Feb 1, 2014 at 9:03 AM, Robert Wille <rwi...@fold3.com> wrote: >> >>> A few days ago I posted about an issue I'm having where GC takes a long >>> time (20-30 seconds), and it happens repeatedly and basically no work gets >>> done. I've done further investigation, and I now believe that I know the >>> cause. If I do a lot of deletes, it creates memory pressure until the >>> memtables are flushed, but Cassandra doesn't flush them. If I manually >>> flush, then life is good again (although that takes a very long time >>> because of the GC issue). If I just leave the flushing to Cassandra, then I >>> end up with death by GC. I believe that when the memtables are full of >>> tombstones, Cassadnra doesn't realize how much memory the memtables are >>> actually taking up, and so it doesn't proactively flush them in order to >>> free up heap. >>> >>> As I was deleting records out of one of my tables, I was watching it via >>> nodetool cfstats, and I found a very curious thing: >>> >>> Memtable cell count: 1285 >>> Memtable data size, bytes: 0 >>> Memtable switch count: 56 >>> >>> As the deletion process was chugging away, the memtable cell count >>> increased, as expected, but the data size stayed at 0. No flushing >>> occurred. >>> >>> Here's the schema for this table: >>> >>> CREATE TABLE bdn_index_pub ( >>> >>> tshard VARCHAR, >>> >>> pord INT, >>> >>> ord INT, >>> >>> hpath VARCHAR, >>> >>> page BIGINT, >>> >>> PRIMARY KEY (tshard, pord) >>> >>> ) WITH gc_grace_seconds = 0 AND compaction = { 'class' : >>> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 }; >>> >>> I have a few tables that I run this cleaning process on, and not all of >>> them exhibit this behavior. One of them reported an increasing number of >>> bytes, as expected, and it also flushed as expected. Here's the schema for >>> that table: >>> >>> >>> CREATE TABLE bdn_index_child ( >>> >>> ptshard VARCHAR, >>> >>> ord INT, >>> >>> hpath VARCHAR, >>> >>> PRIMARY KEY (ptshard, ord) >>> >>> ) WITH gc_grace_seconds = 0 AND compaction = { 'class' : >>> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 }; >>> >>> In both cases, I'm deleting the entire record (i.e. specifying just the >>> first component of the primary key in the delete statement). Most records >>> in bdn_index_pub have 10,000 rows per record. bdn_index_child usually has >>> just a handful of rows, but a few records can have up 10,000. >>> >>> Still a further mystery, 1285 tombstones in the bdn_index_pub memtable >>> doesn't seem like nearly enough to create a memory problem. Perhaps there >>> are other flaws in the memory metering. Or perhaps there is some other >>> issue that causes Cassandra to mismanage the heap when there are a lot of >>> deletes. One other thought I had is that I page through these tables and >>> clean them out as I go. Perhaps there is some interaction between the >>> paging and the deleting that causes the GC problems and I should create a >>> list of keys to delete and then delete them after I've finished reading the >>> entire table. >>> >>> I reduced memtable_total_space_in_mb from the default (probably 2.7 GB) >>> to 1 GB, in hopes that it would force Cassandra to flush tables before I >>> ran into death by GC, but it didn't seem to help. >>> >>> I'm using Cassandra 2.0.4. >>> >>> Any insights would be greatly appreciated. I can't be the only one that >>> has periodic delete-heavy workloads. Hopefully someone else has run into >>> this and can give advice. >>> >>> Thanks >>> >>> Robert >>> >> >> >> >> -- >> ----------------- >> Nate McCall >> Austin, TX >> @zznate >> >> Co-Founder & Sr. Technical Consultant >> Apache Cassandra Consulting >> http://www.thelastpickle.com >> > >