I could be totally wrong here, but If you are doing a QUORUM read and there is 
a bad value encountered from the QUORUM won't a repair happen?  I thought 
read_repair_chance 0 just means it won't query extra nodes to check for bad 
values.

-Jeremiah

On Oct 17, 2011, at 4:22 PM, Jeremy Hanna wrote:

> Even after disabling hinted handoff and setting read_repair_chance to 0 on 
> all our column families, we were still experiencing massive writes.  
> Apparently the read_repair_chance is completely ignored at any CL higher than 
> CL.ONE.  So we were doing CL.QUORUM on reads and writes and seeing massive 
> writes still.  It was because of the background read repairs being done.  We 
> did extensive logging and checking and that's all it could be as no mutations 
> were coming in via thrift to those column families.
> 
> In any case, just wanted to give some follow-up here as it's been an 
> inexplicable rock in our backpack and hopefully clears up where that setting 
> is actually used.  I'll update the storage configuration wiki to include that 
> caveat as well.
> 
> On Sep 10, 2011, at 5:14 PM, Jeremy Hanna wrote:
> 
>> Thanks for the insights.  I may first try disabling hinted handoff for one 
>> run of our data pipeline and see if it exhibits the same behavior.  Will 
>> post back if I see anything enlightening there.
>> 
>> On Sep 10, 2011, at 5:04 PM, Chris Goffinet wrote:
>> 
>>> You could tail the commit log with `strings` to see what keys are being 
>>> inserted.
>>> 
>>> On Sat, Sep 10, 2011 at 2:24 PM, Jonathan Ellis <jbel...@gmail.com> wrote:
>>> Two possibilities:
>>> 
>>> 1) Hinted handoff (this will show up in the logs on the sending
>>> machine, on the receiving one it will just look like any other write)
>>> 
>>> 2) You have something doing writes that you're not aware of, I guess
>>> you could track that down using wireshark to see where the write
>>> messages are coming from
>>> 
>>> On Sat, Sep 10, 2011 at 3:56 PM, Jeremy Hanna
>>> <jeremy.hanna1...@gmail.com> wrote:
>>>> Oh and we're running 0.8.4 and the RF is 3.
>>>> 
>>>> On Sep 10, 2011, at 3:49 PM, Jeremy Hanna wrote:
>>>> 
>>>>> In addition, the mutation stage and the read stage are backed up like:
>>>>> 
>>>>> Pool Name                    Active   Pending   Blocked
>>>>> ReadStage                        32       773         0
>>>>> RequestResponseStage              0         0         0
>>>>> ReadRepairStage                   0         0         0
>>>>> MutationStage                   158    525918         0
>>>>> ReplicateOnWriteStage             0         0         0
>>>>> GossipStage                       0         0         0
>>>>> AntiEntropyStage                  0         0         0
>>>>> MigrationStage                    0         0         0
>>>>> StreamStage                       0         0         0
>>>>> MemtablePostFlusher               1         5         0
>>>>> FILEUTILS-DELETE-POOL             0         0         0
>>>>> FlushWriter                       2         5         0
>>>>> MiscStage                         0         0         0
>>>>> FlushSorter                       0         0         0
>>>>> InternalResponseStage             0         0         0
>>>>> HintedHandoff                     0         0         0
>>>>> CompactionManager               n/a        29
>>>>> MessagingService                n/a      0,34
>>>>> 
>>>>> On Sep 10, 2011, at 3:38 PM, Jeremy Hanna wrote:
>>>>> 
>>>>>> We are experiencing massive writes to column families when only doing 
>>>>>> reads from Cassandra.  A set of 5 hadoop jobs are reading from Cassandra 
>>>>>> and then writing out to hdfs.  That is the only thing operating on the 
>>>>>> cluster.  We are reading at CL.QUORUM with hadoop and have written with 
>>>>>> CL.QUORUM.  Read repair chance is set to 0.0 on all column families.  
>>>>>> However, in the logs, I'm seeing flush after flush of memtables and 
>>>>>> compactions taking place.  Is there something else that would be writing 
>>>>>> based on the above description?
>>>>>> 
>>>>>> Jeremy
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of DataStax, the source for professional Cassandra support
>>> http://www.datastax.com
>>> 
>> 
> 

Reply via email to