Re: node repair eat up all disk io and slow down entire cluster(3 nodes)

aaron morton Thu, 21 Jul 2011 04:52:55 -0700

What are you seeing in compaction stats ? 

You may see some of  https://issues.apache.org/jira/browse/CASSANDRA-2280


Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 21 Jul 2011, at 23:17, Yan Chunlu wrote:

> after tried nodetool -h reagon repair key cf, I found that even repair single 
> CF, it involves rebuild all sstables(using nodetool compactionstats), is that 
> normal? 
> 
> On Thu, Jul 21, 2011 at 7:56 AM, Aaron Morton <[email protected]> wrote:
> If you have never run repair also check the section on repair on this page 
> http://wiki.apache.org/cassandra/Operations About how frequently it should be 
> run.
> 
> There is an issue where repair can stream too much data, and this can lead to 
> excessive disk use.
> 
> My non scientific approach to the never run repair before problem is to 
> repair a single CF at a time, starting with the small ones that are less 
> likely to have differences as they will stream the smallest amount of data. 
> 
> If you really want to conserve disk IO during the repair consider disabling 
> the minor compaction by setting the min and max thresholds to 0 via node tool.
> 
> hope that helps.
> 
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 20/07/2011, at 11:46 PM, Yan Chunlu <[email protected]> wrote:
> 
>> just found this:
>> https://issues.apache.org/jira/browse/CASSANDRA-2156
>> 
>> but seems only available to 0.8 and people submitted a patch for 0.6, I am 
>> using 0.7.4, do I need to dig into the code and make my own patch?
>> 
>> does add compaction throttle solve the io problem?  thanks!
>> 
>> On Wed, Jul 20, 2011 at 4:44 PM, Yan Chunlu <[email protected]> wrote:
>> at the beginning of using cassandra, I have no idea that I should run "node 
>> repair" frequently, so basically, I have 3 nodes with RF=3 and have not run 
>> node repair for months, the data size is 20G.
>> 
>> the problem is when I start running node repair now, it eat up all disk io 
>> and the server load became 20+ and increasing, the worst thing is, the 
>> entire cluster has slowed down and can not handle request. so I have to stop 
>> it immediately because it make my web service unavailable.
>> 
>> the server has Intel Xeon-Lynnfield 3470-Quadcore [2.93GHz] and 8G memory, 
>> with Western Digital WD RE3 WD1002FBYS SATA disk.
>> 
>> I really have no idea what to do now, as currently I have already found some 
>> data loss, any suggestions would be appreciated.  
>> 
>> 
>> 
>> -- 
>> 闫春路
> 
> 
> 
> -- 
> 闫春路

Re: node repair eat up all disk io and slow down entire cluster(3 nodes)

Reply via email to