Re: Massive writes when only reading from Cassandra

Jonathan Ellis Fri, 21 Oct 2011 06:38:59 -0700

Correct.

On Fri, Oct 21, 2011 at 6:47 AM, Jeremiah Jordan
<jeremiah.jor...@morningstar.com> wrote:
> I could be totally wrong here, but If you are doing a QUORUM read and there 
> is a bad value encountered from the QUORUM won't a repair happen?  I thought 
> read_repair_chance 0 just means it won't query extra nodes to check for bad 
> values.
>
> -Jeremiah
>
> On Oct 17, 2011, at 4:22 PM, Jeremy Hanna wrote:
>
>> Even after disabling hinted handoff and setting read_repair_chance to 0 on 
>> all our column families, we were still experiencing massive writes.  
>> Apparently the read_repair_chance is completely ignored at any CL higher 
>> than CL.ONE.  So we were doing CL.QUORUM on reads and writes and seeing 
>> massive writes still.  It was because of the background read repairs being 
>> done.  We did extensive logging and checking and that's all it could be as 
>> no mutations were coming in via thrift to those column families.
>>
>> In any case, just wanted to give some follow-up here as it's been an 
>> inexplicable rock in our backpack and hopefully clears up where that setting 
>> is actually used.  I'll update the storage configuration wiki to include 
>> that caveat as well.
>>
>> On Sep 10, 2011, at 5:14 PM, Jeremy Hanna wrote:
>>
>>> Thanks for the insights.  I may first try disabling hinted handoff for one 
>>> run of our data pipeline and see if it exhibits the same behavior.  Will 
>>> post back if I see anything enlightening there.
>>>
>>> On Sep 10, 2011, at 5:04 PM, Chris Goffinet wrote:
>>>
>>>> You could tail the commit log with `strings` to see what keys are being 
>>>> inserted.
>>>>
>>>> On Sat, Sep 10, 2011 at 2:24 PM, Jonathan Ellis <jbel...@gmail.com> wrote:
>>>> Two possibilities:
>>>>
>>>> 1) Hinted handoff (this will show up in the logs on the sending
>>>> machine, on the receiving one it will just look like any other write)
>>>>
>>>> 2) You have something doing writes that you're not aware of, I guess
>>>> you could track that down using wireshark to see where the write
>>>> messages are coming from
>>>>
>>>> On Sat, Sep 10, 2011 at 3:56 PM, Jeremy Hanna
>>>> <jeremy.hanna1...@gmail.com> wrote:
>>>>> Oh and we're running 0.8.4 and the RF is 3.
>>>>>
>>>>> On Sep 10, 2011, at 3:49 PM, Jeremy Hanna wrote:
>>>>>
>>>>>> In addition, the mutation stage and the read stage are backed up like:
>>>>>>
>>>>>> Pool Name                    Active   Pending   Blocked
>>>>>> ReadStage                        32       773         0
>>>>>> RequestResponseStage              0         0         0
>>>>>> ReadRepairStage                   0         0         0
>>>>>> MutationStage                   158    525918         0
>>>>>> ReplicateOnWriteStage             0         0         0
>>>>>> GossipStage                       0         0         0
>>>>>> AntiEntropyStage                  0         0         0
>>>>>> MigrationStage                    0         0         0
>>>>>> StreamStage                       0         0         0
>>>>>> MemtablePostFlusher               1         5         0
>>>>>> FILEUTILS-DELETE-POOL             0         0         0
>>>>>> FlushWriter                       2         5         0
>>>>>> MiscStage                         0         0         0
>>>>>> FlushSorter                       0         0         0
>>>>>> InternalResponseStage             0         0         0
>>>>>> HintedHandoff                     0         0         0
>>>>>> CompactionManager               n/a        29
>>>>>> MessagingService                n/a      0,34
>>>>>>
>>>>>> On Sep 10, 2011, at 3:38 PM, Jeremy Hanna wrote:
>>>>>>
>>>>>>> We are experiencing massive writes to column families when only doing 
>>>>>>> reads from Cassandra.  A set of 5 hadoop jobs are reading from 
>>>>>>> Cassandra and then writing out to hdfs.  That is the only thing 
>>>>>>> operating on the cluster.  We are reading at CL.QUORUM with hadoop and 
>>>>>>> have written with CL.QUORUM.  Read repair chance is set to 0.0 on all 
>>>>>>> column families.  However, in the logs, I'm seeing flush after flush of 
>>>>>>> memtables and compactions taking place.  Is there something else that 
>>>>>>> would be writing based on the above description?
>>>>>>>
>>>>>>> Jeremy
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Jonathan Ellis
>>>> Project Chair, Apache Cassandra
>>>> co-founder of DataStax, the source for professional Cassandra support
>>>> http://www.datastax.com
>>>>
>>>
>>
>
>




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Massive writes when only reading from Cassandra

Reply via email to