When I run repair on a node in my 0.7.6-2 cluster, the repair starts to stream data and activity is seen in the logs.
However, after a while (a day or so) it seems like everything freezes up. The repair command is still running (the command prompt has not returned) and netstats shows output similar to below. All streams at 0% and nothing happening. The logs indicate that things were started but there is no indication if anything is in fact still active. For example, this is the last log entry related to repair, just this morning: INFO [StreamStage:1] 2011-06-09 07:13:21,423 StreamOut.java (line 173) Stream context metadata [/var/lib/cassandra/data/DFS/main-f-144-Data.db sections=2 progress=0/31947748 - 0%, /var/lib/cassandra/data/DFS/main-f-145-Data.db section s=2 progress=0/25786564 - 0%, /var/lib/cassandra/data/DFS/main-f-143-Data.db sections=2 progress=0/5830103399 - 0%], 9 sstables. INFO [StreamStage:1] 2011-06-09 07:13:21,423 StreamOutSession.java (line 174) Streaming to /10.46.108.104 However, netstats on all related notes looks something like this. The nodes continue to handle read/write requests just fine. They are not overloaded at all. Any advice would be greatly appreciated. Because repairs seem like they never finish, I have a feeling we have a lot of garbage data in our cluster. /opt/cassandra/bin/nodetool -h $HOSTNAME -p 35014 netstats Mode: Normal Not sending any streams. Streaming from: /10.46.108.104 DFS: /var/lib/cassandra/data/DFS/main-f-209-Data.db sections=2 progress=0/276461810 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-153-Data.db sections=2 progress=0/100340568 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-40-Data.db sections=2 progress=0/62726190502 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-180-Data.db sections=1 progress=0/158898493 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-109-Data.db sections=2 progress=0/87250515569 - 0% Streaming from: /10.47.108.102 DFS: /var/lib/cassandra/data/DFS/main-f-304-Data.db sections=2 progress=0/13563864214 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-350-Data.db sections=1 progress=0/2877129955 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-379-Data.db sections=2 progress=0/143804948 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-370-Data.db sections=2 progress=0/683716174 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-371-Data.db sections=2 progress=0/56650 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-368-Data.db sections=2 progress=0/4005533616 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-369-Data.db sections=2 progress=0/155515922 - 0% Streaming from: /10.46.108.103 DFS: /var/lib/cassandra/data/DFS/main-f-888-Data.db sections=2 progress=0/158096259 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-828-Data.db sections=1 progress=0/29508276 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-886-Data.db sections=2 progress=0/133704150 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-759-Data.db sections=2 progress=0/83629797522 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-889-Data.db sections=2 progress=0/96903803 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-751-Data.db sections=2 progress=0/17944852950 - 0% Streaming from: /10.46.108.101 DFS: /var/lib/cassandra/data/DFS/main-f-1318-Data.db sections=2 progress=0/60617216778 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-1179-Data.db sections=2 progress=0/11870790009 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-1324-Data.db sections=2 progress=0/710603722 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-1322-Data.db sections=2 progress=0/5844992187 - 0%