Thank you so much. Everything I had seen pointed to this being the case. I¹m
glad that someone in the know has confirmed this bug and fixed it. Now I
just need to figure out where to go from here: do I wait, use the dev branch
or work around.

Robert

From:  Benedict Elliott Smith <belliottsm...@datastax.com>
Reply-To:  <user@cassandra.apache.org>
Date:  Wednesday, February 5, 2014 at 8:32 AM
To:  <user@cassandra.apache.org>
Subject:  Re: Lots of deletions results in death by GC

I believe there is a bug, and I have filed a ticket for it:
https://issues.apache.org/jira/browse/CASSANDRA-6655

I will have a patch uploaded shortly, but it's just missed the 2.0.5 release
window, so you'll either need to grab the development branch once it's
committed or wait until 2.0.6


On 5 February 2014 15:09, Robert Wille <rwi...@fold3.com> wrote:
> Yes. It¹s kind of an unusual workload. An insertion phase followed by a
> deletion phase, generally not overlapping.
> 
> From:  Benedict Elliott Smith <belliottsm...@datastax.com>
> Reply-To:  <user@cassandra.apache.org>
> Date:  Tuesday, February 4, 2014 at 5:29 PM
> To:  <user@cassandra.apache.org>
> 
> Subject:  Re: Lots of deletions results in death by GC
> 
> Is it possible you are generating exclusively deletes for this table?
> 
> 
> On 5 February 2014 00:10, Robert Wille <rwi...@fold3.com> wrote:
>> I ran my test again, and Flush Writer¹s ³All time blocked² increased to 2 and
>> then shortly thereafter GC went into its death spiral. I doubled
>> memtable_flush_writers (to 2) and memtable_flush_queue_size (to 8) and tried
>> again.
>> 
>> This time, the table that always sat with Memtable data size = 0 now showed
>> increases in Memtable data size. That was encouraging. It never flushed,
>> which isn¹t too surprising, because that table has relatively few rows and
>> they are pretty wide. However, on the fourth table to clean, Flush Writer¹s
>> ³All time blocked² went to 1, and then there were no more completed events,
>> and about 10 minutes later GC went into its death spiral. I assume that each
>> time Flush Writer completes an event, that means a table was flushed. Is that
>> right? Also, I got two dropped mutation messages at the same time that Flush
>> Writer¹s All time blocked incremented.
>> 
>> I then increased the writers and queue size to 3 and 12, respectively, and
>> ran my test again. This time All time blocked remained at 0, but I still
>> suffered death by GC.
>> 
>> I would almost think that this is caused by high load on the server, but I¹ve
>> never seen CPU utilization go above about two of my eight available cores. If
>> high load triggers this problem, then that is very disconcerting. That means
>> that a CPU spike could permanently cripple a node. Okay, not permanently, but
>> until a manual flush occurs.
>> 
>> If anyone has any further thoughts, I¹d love to hear them. I¹m quite at the
>> end of my rope.
>> 
>> Thanks in advance
>> 
>> Robert
>> 
>> From:  Nate McCall <n...@thelastpickle.com>
>> Reply-To:  <user@cassandra.apache.org>
>> Date:  Saturday, February 1, 2014 at 9:25 AM
>> To:  Cassandra Users <user@cassandra.apache.org>
>> Subject:  Re: Lots of deletions results in death by GC
>> 
>> What's the output of 'nodetool tpstats' while this is happening? Specifically
>> is Flush Writer "All time blocked" increasing? If so, play around with
>> turning up memtable_flush_writers and memtable_flush_queue_size and see if
>> that helps.
>> 
>> 
>> On Sat, Feb 1, 2014 at 9:03 AM, Robert Wille <rwi...@fold3.com> wrote:
>>> A few days ago I posted about an issue I¹m having where GC takes a long time
>>> (20-30 seconds), and it happens repeatedly and basically no work gets done.
>>> I¹ve done further investigation, and I now believe that I know the cause. If
>>> I do a lot of deletes, it creates memory pressure until the memtables are
>>> flushed, but Cassandra doesn¹t flush them. If I manually flush, then life is
>>> good again (although that takes a very long time because of the GC issue).
>>> If I just leave the flushing to Cassandra, then I end up with death by GC. I
>>> believe that when the memtables are full of tombstones, Cassadnra doesn¹t
>>> realize how much memory the memtables are actually taking up, and so it
>>> doesn¹t proactively flush them in order to free up heap.
>>> 
>>> As I was deleting records out of one of my tables, I was watching it via
>>> nodetool cfstats, and I found a very curious thing:
>>> 
>>>                 Memtable cell count: 1285
>>>                 Memtable data size, bytes: 0
>>>                 Memtable switch count: 56
>>> 
>>> As the deletion process was chugging away, the memtable cell count
>>> increased, as expected, but the data size stayed at 0. No flushing occurred.
>>> 
>>> Here¹s the schema for this table:
>>> 
>>> CREATE TABLE bdn_index_pub (
>>> 
>>> tshard VARCHAR,
>>> 
>>> pord INT,
>>> 
>>> ord INT,
>>> 
>>> hpath VARCHAR,
>>> 
>>> page BIGINT,
>>> 
>>> PRIMARY KEY (tshard, pord)
>>> 
>>> ) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
>>> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
>>> 
>>> 
>>> I have a few tables that I run this cleaning process on, and not all of them
>>> exhibit this behavior. One of them reported an increasing number of bytes,
>>> as expected, and it also flushed as expected. Here¹s the schema for that
>>> table:
>>> 
>>> 
>>> CREATE TABLE bdn_index_child (
>>> 
>>> ptshard VARCHAR,
>>> 
>>> ord INT,
>>> 
>>> hpath VARCHAR,
>>> 
>>> PRIMARY KEY (ptshard, ord)
>>> 
>>> ) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
>>> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
>>> 
>>> 
>>> In both cases, I¹m deleting the entire record (i.e. specifying just the
>>> first component of the primary key in the delete statement). Most records in
>>> bdn_index_pub have 10,000 rows per record. bdn_index_child usually has just
>>> a handful of rows, but a few records can have up 10,000.
>>> 
>>> Still a further mystery, 1285 tombstones in the bdn_index_pub memtable
>>> doesn¹t seem like nearly enough to create a memory problem. Perhaps there
>>> are other flaws in the memory metering. Or perhaps there is some other issue
>>> that causes Cassandra to mismanage the heap when there are a lot of deletes.
>>> One other thought I had is that I page through these tables and clean them
>>> out as I go. Perhaps there is some interaction between the paging and the
>>> deleting that causes the GC problems and I should create a list of keys to
>>> delete and then delete them after I¹ve finished reading the entire table.
>>> 
>>> I reduced memtable_total_space_in_mb from the default (probably 2.7 GB) to 1
>>> GB, in hopes that it would force Cassandra to flush tables before I ran into
>>> death by GC, but it didn¹t seem to help.
>>> 
>>> I¹m using Cassandra 2.0.4.
>>> 
>>> Any insights would be greatly appreciated. I can¹t be the only one that has
>>> periodic delete-heavy workloads. Hopefully someone else has run into this
>>> and can give advice.
>>> 
>>> Thanks
>>> 
>>> Robert
>> 
>> 
>> 
>> -- 
>> -----------------
>> Nate McCall
>> Austin, TX
>> @zznate
>> 
>> Co-Founder & Sr. Technical Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
> 



Reply via email to