I am running RF=2(I have changed it from 2->3 and back to 2) and 3 nodes and didn't running node repair more than 10 days, did not aware of this is critical. I run node repair recently and one of the node always hung... from log it seems doing nothing related to the repair.
so I got two problems: 1) do I need to treat every node as failure and do a rolling replacement? since there might be some inconsistent in the cluster even I have no way to find out. 2) is that the reason that caused the node repair hung? the log message says: Jul 10, 2011 4:40:35 AM ClientCommunicatorAdmin Checker-run WARNING: Failed to check the connection: java.net.SocketTimeoutException: Read timed out then nothing. thanks! On Sat, Jul 9, 2011 at 10:16 PM, Peter Schuller <peter.schul...@infidyne.com > wrote: > >> - Have you been running repair consistently ? > > > > Nop, only when something breaks > > This is unrelated to the problem you were asking about, but if you > never run delete, make sure you are aware of: > > http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair > http://wiki.apache.org/cassandra/DistributedDeletes > > > -- > / Peter Schuller > -- 闫春路