> I checked the flushwriter thread pool stats and saw this:
> Pool Name                    Active   Pending      Completed   Blocked
> All time blocked
> FlushWriter                       1         5                    86183
>        1             17582
That's not good. 
Is the IO system over utilised ? 

> In my setup, I have one extremely high traffic column family that is
Any secondary indexes? If so see the comments for memtable_flush_queue_size in 
the yaml file. 

> From the following log output, it looks like the cf with the large
> data load is blocking the flush of the other cf's. 
Not sure that is the case. 
In the log messages the commit log is rotating and needs to free up an old log 
segment (on the OptionalTasks thread) so it is flushing all of the CF's that 
have something written to the log segment. 


This could be that IO is not keeping up, it's unlikely to be a switch lock 
issue if you only have a 4 CF's. Also have you checked for GC messages in the 
C* logs ? 

Cheers
 

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 19/03/2013, at 12:25 PM, Jasdeep Hundal <dsjas...@gmail.com> wrote:

> Thanks for the info, got a couple of follow up questions, and just as
> a note, this is on Cassandra 1.2.0.
> 
> I checked the flushwriter thread pool stats and saw this:
> Pool Name                    Active   Pending      Completed   Blocked
> All time blocked
> FlushWriter                       1         5                    86183
>        1             17582
> 
> Also, memtable_flush_queue_size is set to 4, and
> memtable_flush_writers is set to 1
> 
> In my setup, I have one extremely high traffic column family that is
> flushing lots of data at once (occasionally hitting hundreds of
> megabytes), and several smaller cf's for which flushes involve only a
> few bytes of data.
> 
> From the following log output, it looks like the cf with the large
> data load is blocking the flush of the other cf's. Would increasing
> memtable_flush_queue_size (I've got plenty of memory) and
> memtable_flush_writers allow the smaller cf flushes to return faster?
> Or given that I see the smaller cf's being flushed when not much has
> been written to them, should I try reducing
> commit_log_segment_size_in_mb.
> 
> 168026:2013-03-18 17:53:41,938 INFO  [OptionalTasks:1]
> org.apache.cassandra.db.ColumnFamilyStore.switchMemtable
> ColumnFamilyStore.java:647) - Enqueuing flush of
> Memtable-data@2098591494(458518528/458518528 serialized/live bytes,
> 111732 ops)
> 168028:2013-03-18 17:53:47,064 INFO  [OptionalTasks:1]
> org.apache.cassandra.db.ColumnFamilyStore.switchMemtable
> (ColumnFamilyStore.java:647) - Enqueuing flush of
> Memtable-metadata@252512204(2295/2295 serialized/live bytes, 64 ops)
> 168029:2013-03-18 17:53:47,065 INFO  [OptionalTasks:1]
> org.apache.cassandra.db.ColumnFamilyStore.switchMemtable
> (ColumnFamilyStore.java:647) - Enqueuing flush of
> Memtable-counters@544926156(363/363 serialized/live bytes, 12 ops)
> 168030:2013-03-18 17:53:47,066 INFO  [OptionalTasks:1]
> org.apache.cassandra.db.ColumnFamilyStore.switchMemtable
> (ColumnFamilyStore.java:647) - Enqueuing flush of
> Memtable-container_counters@1703633084(430/430 serialized/live bytes,
> 83 ops)
> 168032:2013-03-18 17:53:51,950 INFO  [FlushWriter:3]
> org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents
> (Memtable.java:458) - Completed flushing
> /mnt/test/jasdeep/data/jasdeep-data-ia-720454-Data.db (391890130
> bytes) for commitlog position ReplayPosition(segmentId=1363628611044,
> position=21069295)
> 168050:2013-03-18 17:53:55,948 INFO  [FlushWriter:3]
> org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents
> (Memtable.java:458) - Completed flushing
> /mnt/test/jasdeep/metadata/jasdeep-metadata-ia-1280-Data.db (833
> bytes) for commitlog position ReplayPosition(segmentId=1363628611047,
> position=4213859)
> 168052:2013-03-18 17:53:55,966 INFO  [FlushWriter:3]
> org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents
> (Memtable.java:458) - Completed flushing
> /mnt/test/jasdeep/counters/jasdeep-counters-ia-1204-Data.db (342
> bytes) for commitlog position ReplayPosition(segmentId=1363628611047,
> position=4213859)
> 
> Thanks again,
> Jasdeep
> 
> 
> 
> On Mon, Mar 18, 2013 at 10:24 AM, aaron morton <aa...@thelastpickle.com> 
> wrote:
>> 1. With a ConsistencyLevel of quorum, does
>> FBUtilities.waitForFutures() wait for read repair to complete before
>> returning?
>> 
>> No
>> That's just a utility method.
>> Nothing on the read path waits for Read Repair, and controlled by
>> read_repair_chance CF property, it's all async to the client request.
>> There is no CL, the messages are sent to individual nodes.
>> 
>> 2. When read repair applies a mutation, it needs to obtain a lock for
>> the associated memtable.
>> 
>> What lock are you referring to?
>> When Read Repair (the RowDataResolver) wants to send a mutation it uses the
>> MessageServer. On the write path there is a server wide RW lock call the
>> sync lock.
>> 
>> I've seen readrepair spend a few seconds stalling in
>> org.apache.cassandra.db.Table.apply).
>> 
>> This  could be contention around the sync lock, look for blocked tasks in
>> the flush writer thread pool.
>> 
>> I did a talk on cassandra internals at Apache Con 3 weeks ago, not sure when
>> the video is going to be up but here are the slides
>> http://www.slideshare.net/aaronmorton/apachecon-nafeb2013
>> 
>> Cheers
>> 
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 16/03/2013, at 12:21 PM, Jasdeep Hundal <dsjas...@gmail.com> wrote:
>> 
>> I've got a couple of questions related issues I'm encountering using
>> Cassandra under a heavy write load:
>> 
>> 1. With a ConsistencyLevel of quorum, does
>> FBUtilities.waitForFutures() wait for read repair to complete before
>> returning?
>> 2. When read repair applies a mutation, it needs to obtain a lock for
>> the associated memtable. Does compaction obtain this same lock? (I'm
>> asking because I've seen readrepair spend a few seconds stalling in
>> org.apache.cassandra.db.Table.apply).
>> 
>> Thanks,
>> Jasdeep
>> 
>> 

Reply via email to