Re: nodetool repair taking forever

aaron morton Tue, 22 May 2012 02:06:04 -0700

> I also dont understand if all these nodes are replicas of each other why is 
> that the first node has almost double the data.
Have you performed any token moves ? Old data is not deleted unless you run 
nodetool cleanup. 
Another possibility is things like a lot of hints. Admittedly it would have to 
be a *lot* of hints.
The third is that compaction has fallen behind.


> This week its even worse, the nodetool repair has been running for the last 
> 15 hours just on the first node and when I run nodetool compactionstats I 
> constantly see this -
> 
> pending tasks: 3
First check the logs for errors. 

Repair will first calculate the differences, you can see this as a validation 
compaction in nodetool compactionstats.
Then it will stream the data, you can watch that with nodetool netstats. 

Try to work out which part is taking the most time.   15 hours for 50Gb sounds 
like a long time (btw do you have compaction on ?)

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 20/05/2012, at 3:14 AM, Raj N wrote:

> Hi experts,
> 
> I have a 6 node cluster spread across 2 DCs. 
> 
>     DC          Rack        Status State   Load            Owns    Token
>                                                                    
> 113427455640312814857969558651062452225
>     DC1         RAC13       Up     Normal  95.98 GB        33.33%  0
>     DC2         RAC5        Up     Normal  50.79 GB        0.00%   1
>     DC1         RAC18       Up     Normal  50.83 GB        33.33%  
> 56713727820156407428984779325531226112
>     DC2         RAC7        Up     Normal  50.74 GB        0.00%   
> 56713727820156407428984779325531226113
>     DC1         RAC19       Up     Normal  61.72 GB        33.33%  
> 113427455640312814857969558651062452224
>     DC2         RAC9        Up     Normal  50.83 GB        0.00%   
> 113427455640312814857969558651062452225
> 
> They are all replicas of each other. All reads and writes are done at 
> LOCAL_QUORUM. We are on Cassandra 0.8.4. I see that our weekend nodetool 
> repair runs for more than 12 hours. Especially on the first one which has 96 
> GB data. Is this usual? We are using 500 GB SAS drives with ext4 file system. 
> This gets worse every week. This week its even worse, the nodetool repair has 
> been running for the last 15 hours just on the first node and when I run 
> nodetool compactionstats I constantly see this -
> 
> pending tasks: 3
> 
> and nothing else. Looks like its just stuck. There's nothing substantial in 
> the logs as well. I also dont understand if all these nodes are replicas of 
> each other why is that the first node has almost double the data. Any help 
> will be really appreciated.
> 
> Thanks
> -Raj

Re: nodetool repair taking forever

Reply via email to