What's the output of 'nodetool tpstats' while this is happening?
Specifically is Flush Writer "All time blocked" increasing? If so, play
around with turning up memtable_flush_writers and memtable_flush_queue_size
and see if that helps.


On Sat, Feb 1, 2014 at 9:03 AM, Robert Wille <rwi...@fold3.com> wrote:

> A few days ago I posted about an issue I'm having where GC takes a long
> time (20-30 seconds), and it happens repeatedly and basically no work gets
> done. I've done further investigation, and I now believe that I know the
> cause. If I do a lot of deletes, it creates memory pressure until the
> memtables are flushed, but Cassandra doesn't flush them. If I manually
> flush, then life is good again (although that takes a very long time
> because of the GC issue). If I just leave the flushing to Cassandra, then I
> end up with death by GC. I believe that when the memtables are full of
> tombstones, Cassadnra doesn't realize how much memory the memtables are
> actually taking up, and so it doesn't proactively flush them in order to
> free up heap.
>
> As I was deleting records out of one of my tables, I was watching it via
> nodetool cfstats, and I found a very curious thing:
>
>                 Memtable cell count: 1285
>                 Memtable data size, bytes: 0
>                 Memtable switch count: 56
>
> As the deletion process was chugging away, the memtable cell count
> increased, as expected, but the data size stayed at 0. No flushing
> occurred.
>
> Here's the schema for this table:
>
> CREATE TABLE bdn_index_pub (
>
> tshard VARCHAR,
>
> pord INT,
>
> ord INT,
>
> hpath VARCHAR,
>
> page BIGINT,
>
> PRIMARY KEY (tshard, pord)
>
> ) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
>
> I have a few tables that I run this cleaning process on, and not all of
> them exhibit this behavior. One of them reported an increasing number of
> bytes, as expected, and it also flushed as expected. Here's the schema for
> that table:
>
>
> CREATE TABLE bdn_index_child (
>
> ptshard VARCHAR,
>
> ord INT,
>
> hpath VARCHAR,
>
> PRIMARY KEY (ptshard, ord)
>
> ) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
>
> In both cases, I'm deleting the entire record (i.e. specifying just the
> first component of the primary key in the delete statement). Most records
> in bdn_index_pub have 10,000 rows per record. bdn_index_child usually has
> just a handful of rows, but a few records can have up 10,000.
>
> Still a further mystery, 1285 tombstones in the bdn_index_pub memtable
> doesn't seem like nearly enough to create a memory problem. Perhaps there
> are other flaws in the memory metering. Or perhaps there is some other
> issue that causes Cassandra to mismanage the heap when there are a lot of
> deletes. One other thought I had is that I page through these tables and
> clean them out as I go. Perhaps there is some interaction between the
> paging and the deleting that causes the GC problems and I should create a
> list of keys to delete and then delete them after I've finished reading the
> entire table.
>
> I reduced memtable_total_space_in_mb from the default (probably 2.7 GB) to
> 1 GB, in hopes that it would force Cassandra to flush tables before I ran
> into death by GC, but it didn't seem to help.
>
> I'm using Cassandra 2.0.4.
>
> Any insights would be greatly appreciated. I can't be the only one that
> has periodic delete-heavy workloads. Hopefully someone else has run into
> this and can give advice.
>
> Thanks
>
> Robert
>



-- 
-----------------
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Reply via email to