> 1) do I need to treat every node as failure and do a rolling replacement?  
> since there might be some inconsistent in the cluster even I have no way to 
> find out.
see 
http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds

> 2) is that the reason that caused the node repair hung? the log message says:
> Jul 10, 2011 4:40:35 AM ClientCommunicatorAdmin Checker-run
> WARNING: Failed to check the connection: java.net.SocketTimeoutException: 
> Read timed out
I cannot find that anywhere in the code base, can you provide some more 
information ? 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 10 Jul 2011, at 03:26, Yan Chunlu wrote:

> I am running RF=2(I have changed it from 2->3 and back to 2) and 3 nodes and 
> didn't running node repair more than 10 days, did not aware of this is 
> critical.  I run node repair recently and one of the node always hung... from 
> log it seems doing nothing related to the repair.
> 
> so I got two problems:
> 
> 1) do I need to treat every node as failure and do a rolling replacement?  
> since there might be some inconsistent in the cluster even I have no way to 
> find out.
> 2) is that the reason that caused the node repair hung? the log message says:
> Jul 10, 2011 4:40:35 AM ClientCommunicatorAdmin Checker-run
> WARNING: Failed to check the connection: java.net.SocketTimeoutException: 
> Read timed out
> 
> then nothing.
> 
> thanks!
> 
> On Sat, Jul 9, 2011 at 10:16 PM, Peter Schuller <peter.schul...@infidyne.com> 
> wrote:
> >> - Have you been running repair consistently ?
> >
> > Nop, only when something breaks
> 
> This is unrelated to the problem you were asking about, but if you
> never run delete, make sure you are aware of:
> 
> http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair
> http://wiki.apache.org/cassandra/DistributedDeletes
> 
> 
> --
> / Peter Schuller
> 
> 
> 
> -- 
> 闫春路

Reply via email to