Thank you so much. Everything I had seen pointed to this being the case. I¹m glad that someone in the know has confirmed this bug and fixed it. Now I just need to figure out where to go from here: do I wait, use the dev branch or work around.
Robert From: Benedict Elliott Smith <belliottsm...@datastax.com> Reply-To: <user@cassandra.apache.org> Date: Wednesday, February 5, 2014 at 8:32 AM To: <user@cassandra.apache.org> Subject: Re: Lots of deletions results in death by GC I believe there is a bug, and I have filed a ticket for it: https://issues.apache.org/jira/browse/CASSANDRA-6655 I will have a patch uploaded shortly, but it's just missed the 2.0.5 release window, so you'll either need to grab the development branch once it's committed or wait until 2.0.6 On 5 February 2014 15:09, Robert Wille <rwi...@fold3.com> wrote: > Yes. It¹s kind of an unusual workload. An insertion phase followed by a > deletion phase, generally not overlapping. > > From: Benedict Elliott Smith <belliottsm...@datastax.com> > Reply-To: <user@cassandra.apache.org> > Date: Tuesday, February 4, 2014 at 5:29 PM > To: <user@cassandra.apache.org> > > Subject: Re: Lots of deletions results in death by GC > > Is it possible you are generating exclusively deletes for this table? > > > On 5 February 2014 00:10, Robert Wille <rwi...@fold3.com> wrote: >> I ran my test again, and Flush Writer¹s ³All time blocked² increased to 2 and >> then shortly thereafter GC went into its death spiral. I doubled >> memtable_flush_writers (to 2) and memtable_flush_queue_size (to 8) and tried >> again. >> >> This time, the table that always sat with Memtable data size = 0 now showed >> increases in Memtable data size. That was encouraging. It never flushed, >> which isn¹t too surprising, because that table has relatively few rows and >> they are pretty wide. However, on the fourth table to clean, Flush Writer¹s >> ³All time blocked² went to 1, and then there were no more completed events, >> and about 10 minutes later GC went into its death spiral. I assume that each >> time Flush Writer completes an event, that means a table was flushed. Is that >> right? Also, I got two dropped mutation messages at the same time that Flush >> Writer¹s All time blocked incremented. >> >> I then increased the writers and queue size to 3 and 12, respectively, and >> ran my test again. This time All time blocked remained at 0, but I still >> suffered death by GC. >> >> I would almost think that this is caused by high load on the server, but I¹ve >> never seen CPU utilization go above about two of my eight available cores. If >> high load triggers this problem, then that is very disconcerting. That means >> that a CPU spike could permanently cripple a node. Okay, not permanently, but >> until a manual flush occurs. >> >> If anyone has any further thoughts, I¹d love to hear them. I¹m quite at the >> end of my rope. >> >> Thanks in advance >> >> Robert >> >> From: Nate McCall <n...@thelastpickle.com> >> Reply-To: <user@cassandra.apache.org> >> Date: Saturday, February 1, 2014 at 9:25 AM >> To: Cassandra Users <user@cassandra.apache.org> >> Subject: Re: Lots of deletions results in death by GC >> >> What's the output of 'nodetool tpstats' while this is happening? Specifically >> is Flush Writer "All time blocked" increasing? If so, play around with >> turning up memtable_flush_writers and memtable_flush_queue_size and see if >> that helps. >> >> >> On Sat, Feb 1, 2014 at 9:03 AM, Robert Wille <rwi...@fold3.com> wrote: >>> A few days ago I posted about an issue I¹m having where GC takes a long time >>> (20-30 seconds), and it happens repeatedly and basically no work gets done. >>> I¹ve done further investigation, and I now believe that I know the cause. If >>> I do a lot of deletes, it creates memory pressure until the memtables are >>> flushed, but Cassandra doesn¹t flush them. If I manually flush, then life is >>> good again (although that takes a very long time because of the GC issue). >>> If I just leave the flushing to Cassandra, then I end up with death by GC. I >>> believe that when the memtables are full of tombstones, Cassadnra doesn¹t >>> realize how much memory the memtables are actually taking up, and so it >>> doesn¹t proactively flush them in order to free up heap. >>> >>> As I was deleting records out of one of my tables, I was watching it via >>> nodetool cfstats, and I found a very curious thing: >>> >>> Memtable cell count: 1285 >>> Memtable data size, bytes: 0 >>> Memtable switch count: 56 >>> >>> As the deletion process was chugging away, the memtable cell count >>> increased, as expected, but the data size stayed at 0. No flushing occurred. >>> >>> Here¹s the schema for this table: >>> >>> CREATE TABLE bdn_index_pub ( >>> >>> tshard VARCHAR, >>> >>> pord INT, >>> >>> ord INT, >>> >>> hpath VARCHAR, >>> >>> page BIGINT, >>> >>> PRIMARY KEY (tshard, pord) >>> >>> ) WITH gc_grace_seconds = 0 AND compaction = { 'class' : >>> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 }; >>> >>> >>> I have a few tables that I run this cleaning process on, and not all of them >>> exhibit this behavior. One of them reported an increasing number of bytes, >>> as expected, and it also flushed as expected. Here¹s the schema for that >>> table: >>> >>> >>> CREATE TABLE bdn_index_child ( >>> >>> ptshard VARCHAR, >>> >>> ord INT, >>> >>> hpath VARCHAR, >>> >>> PRIMARY KEY (ptshard, ord) >>> >>> ) WITH gc_grace_seconds = 0 AND compaction = { 'class' : >>> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 }; >>> >>> >>> In both cases, I¹m deleting the entire record (i.e. specifying just the >>> first component of the primary key in the delete statement). Most records in >>> bdn_index_pub have 10,000 rows per record. bdn_index_child usually has just >>> a handful of rows, but a few records can have up 10,000. >>> >>> Still a further mystery, 1285 tombstones in the bdn_index_pub memtable >>> doesn¹t seem like nearly enough to create a memory problem. Perhaps there >>> are other flaws in the memory metering. Or perhaps there is some other issue >>> that causes Cassandra to mismanage the heap when there are a lot of deletes. >>> One other thought I had is that I page through these tables and clean them >>> out as I go. Perhaps there is some interaction between the paging and the >>> deleting that causes the GC problems and I should create a list of keys to >>> delete and then delete them after I¹ve finished reading the entire table. >>> >>> I reduced memtable_total_space_in_mb from the default (probably 2.7 GB) to 1 >>> GB, in hopes that it would force Cassandra to flush tables before I ran into >>> death by GC, but it didn¹t seem to help. >>> >>> I¹m using Cassandra 2.0.4. >>> >>> Any insights would be greatly appreciated. I can¹t be the only one that has >>> periodic delete-heavy workloads. Hopefully someone else has run into this >>> and can give advice. >>> >>> Thanks >>> >>> Robert >> >> >> >> -- >> ----------------- >> Nate McCall >> Austin, TX >> @zznate >> >> Co-Founder & Sr. Technical Consultant >> Apache Cassandra Consulting >> http://www.thelastpickle.com >