> 1) do I need to treat every node as failure and do a rolling replacement? > since there might be some inconsistent in the cluster even I have no way to > find out. see http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds
> 2) is that the reason that caused the node repair hung? the log message says: > Jul 10, 2011 4:40:35 AM ClientCommunicatorAdmin Checker-run > WARNING: Failed to check the connection: java.net.SocketTimeoutException: > Read timed out I cannot find that anywhere in the code base, can you provide some more information ? Cheers ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 10 Jul 2011, at 03:26, Yan Chunlu wrote: > I am running RF=2(I have changed it from 2->3 and back to 2) and 3 nodes and > didn't running node repair more than 10 days, did not aware of this is > critical. I run node repair recently and one of the node always hung... from > log it seems doing nothing related to the repair. > > so I got two problems: > > 1) do I need to treat every node as failure and do a rolling replacement? > since there might be some inconsistent in the cluster even I have no way to > find out. > 2) is that the reason that caused the node repair hung? the log message says: > Jul 10, 2011 4:40:35 AM ClientCommunicatorAdmin Checker-run > WARNING: Failed to check the connection: java.net.SocketTimeoutException: > Read timed out > > then nothing. > > thanks! > > On Sat, Jul 9, 2011 at 10:16 PM, Peter Schuller <peter.schul...@infidyne.com> > wrote: > >> - Have you been running repair consistently ? > > > > Nop, only when something breaks > > This is unrelated to the problem you were asking about, but if you > never run delete, make sure you are aware of: > > http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair > http://wiki.apache.org/cassandra/DistributedDeletes > > > -- > / Peter Schuller > > > > -- > 闫春路