My by-now infamous eight-node cluster running 0.7.0beta3+ dropped many replication MUTATEs during load, so I decided to fix replication copies with a "nodetool repair" on one of the nodes (X.21). The repair has been running for two days, and has finally gotten itself wedged into a state where it can't proceed.
The log on X.21 continually describe the need to stream a data file, unsuccessfully. From other clues below I gather this is a receive stream. This message repeated many many times, multiple per second, but has now stopped: INFO [Thread-13877] 2010-11-14 09:17:35,207 StreamInSession.java (line 124) Streaming of file /var/lib/cassandra/data/Attrs/TestAttrs-e-332-Data.db/(0,219682197079) progress=90112/219682197079 - 0% from org.apache.cassandra.streaming.streaminsess...@3ee3da2c failed: requesting a retry. Here's the best joke, though: "nodetool -h X.20 nestats" shows that the given stream has been attempted a few times and is still being attempted, but in a broken way, such that the progress percentage has gone way past 100%. It's now at 1096% and still rising. I'm not rebooting so I can poke around as devs suggest. I'm also not sending logs to the list, at least in part because they're, well, big. If any developers want them, though, I'm happy to send them. ------------------------------------------------------------------------------------------------------------------- Mode: Normal Streaming to: /X.21 /var/lib/cassandra/data/Attrs/TestAttrs-e-332-Data.db/(0,219682197079) progress=2408587638800/219682197079 - 1096% <---- see this /var/lib/cassandra/data/Attrs/TestAttrs-e-386-Data.db/(0,182528797) progress=0/182528797 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-381-Data.db/(0,908075169) progress=0/908075169 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-382-Data.db/(0,784362565) progress=0/784362565 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-378-Data.db/(0,896956312) progress=0/896956312 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-367-Data.db/(0,894019840) progress=0/894019840 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-380-Data.db/(0,901377643) progress=0/901377643 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-333-Data.db/(0,22306924) progress=0/22306924 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-369-Data.db/(0,888814566) progress=0/888814566 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-374-Data.db/(0,889095219) progress=0/889095219 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-375-Data.db/(0,893034298) progress=0/893034298 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-389-Data.db/(0,371718620) progress=0/371718620 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-319-Data.db/(0,14172830870) progress=0/14172830870 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-283-Data.db/(0,8939407316) progress=0/8939407316 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-376-Data.db/(0,897417147) progress=0/897417147 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-385-Data.db/(0,357220526) progress=0/357220526 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-366-Data.db/(0,899103394) progress=0/899103394 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-377-Data.db/(0,898165901) progress=0/898165901 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-331-Data.db/(0,13323957368) progress=0/13323957368 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-373-Data.db/(0,892116147) progress=0/892116147 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-383-Data.db/(0,28216239303) progress=0/28216239303 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-330-Data.db/(0,307921317) progress=0/307921317 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-390-Data.db/(0,185277927) progress=0/185277927 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-372-Data.db/(0,893683568) progress=0/893683568 - 0% Streaming from: /X.21 Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-440-Data.db/(0,176842211) progress=0/176842211 - 0% Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-385-Data.db/(0,447272883) progress=0/447272883 - 0% Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-412-Data.db/(0,444440243) progress=0/444440243 - 0% Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-328-Data.db/(0,14275850800) progress=0/14275850800 - 0% Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-397-Data.db/(0,31878407176) progress=0/31878407176 - 0% Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-393-Data.db/(0,446800028) progress=0/446800028 - 0% Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-439-Data.db/(0,367116560) progress=0/367116560 - 0% Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-390-Data.db/(0,445241132) progress=0/445241132 - 0% Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-271-Data.db/(0,4497953871) progress=0/4497953871 - 0% Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-396-Data.db/(0,449662908) progress=0/449662908 - 0% Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-409-Data.db/(0,454101872) progress=0/454101872 - 0% Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-381-Data.db/(0,447381444) progress=0/447381444 - 0% Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-327-Data.db/(0,208633237) progress=0/208633237 - 0% Pool Name Active Pending Completed Commands n/a 0 709910236 Responses n/a 0 363174385 ------------------------------------------------------------------------------------------------------------------- Meanwhile, "nodetool -h X.21 netstats" shows a large number of transfers that are at 0% and haven't moved, AFAICT, for at least an hour: ------------------------------------------------------------------------------------------------------------------- Mode: Normal Streaming to: /X.20 /var/lib/cassandra/data/Attrs/TestAttrs-e-327-Data.db/(0,208633237) progress=0/208633237 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-439-Data.db/(0,367116560) progress=0/367116560 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-271-Data.db/(0,4497953871) progress=0/4497953871 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-412-Data.db/(0,444440243) progress=0/444440243 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-440-Data.db/(0,176842211) progress=0/176842211 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-328-Data.db/(0,14275850800) progress=0/14275850800 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-390-Data.db/(0,445241132) progress=0/445241132 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-397-Data.db/(0,31878407176) progress=0/31878407176 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-385-Data.db/(0,447272883) progress=0/447272883 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-393-Data.db/(0,446800028) progress=0/446800028 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-396-Data.db/(0,449662908) progress=0/449662908 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-381-Data.db/(0,447381444) progress=0/447381444 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-409-Data.db/(0,454101872) progress=0/454101872 - 0% Streaming to: /X.22 /var/lib/cassandra/data/Attrs/TestAttrs-e-350-Data.db/(0,887780227) progress=0/887780227 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-341-Data.db/(0,885896138) progress=0/885896138 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-368-Data.db/(0,892560053) progress=0/892560053 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-358-Data.db/(0,888436251) progress=0/888436251 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-367-Data.db/(0,893446845) progress=0/893446845 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-354-Data.db/(0,889058842) progress=0/889058842 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-90-Data.db/(0,61505031301) progress=0/61505031301 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-331-Data.db/(0,887620464) progress=0/887620464 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-361-Data.db/(0,890820399) progress=0/890820399 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-345-Data.db/(0,887535512) progress=0/887535512 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-329-Data.db/(0,16876107370) progress=0/16876107370 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-364-Data.db/(0,893839028) progress=0/893839028 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-356-Data.db/(0,891862436) progress=0/891862436 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-332-Data.db/(0,886276363) progress=0/886276363 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-292-Data.db/(0,388239771) progress=0/388239771 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-379-Data.db/(0,907731463) progress=0/907731463 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-348-Data.db/(0,893114355) progress=0/893114355 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-371-Data.db/(0,888682755) progress=0/888682755 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-338-Data.db/(0,885144435) progress=0/885144435 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-340-Data.db/(0,890937418) progress=0/890937418 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-233-Data.db/(0,33902556016) progress=0/33902556016 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-376-Data.db/(0,897426603) progress=0/897426603 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-366-Data.db/(0,888711957) progress=0/888711957 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-327-Data.db/(0,208633237) progress=0/208633237 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-369-Data.db/(0,893954909) progress=0/893954909 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-377-Data.db/(0,897265056) progress=0/897265056 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-339-Data.db/(0,888998653) progress=0/888998653 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-378-Data.db/(0,901053427) progress=0/901053427 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-271-Data.db/(0,4497953871) progress=0/4497953871 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-343-Data.db/(0,891732427) progress=0/891732427 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-357-Data.db/(0,888267065) progress=0/888267065 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-335-Data.db/(0,889998928) progress=0/889998928 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-365-Data.db/(0,888528931) progress=0/888528931 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-328-Data.db/(0,14275850800) progress=0/14275850800 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-355-Data.db/(0,893535664) progress=0/893535664 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-347-Data.db/(0,891375566) progress=0/891375566 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-375-Data.db/(0,897994571) progress=0/897994571 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-373-Data.db/(0,897589898) progress=0/897589898 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-336-Data.db/(0,891079134) progress=0/891079134 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-372-Data.db/(0,892852094) progress=0/892852094 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-337-Data.db/(0,885983148) progress=0/885983148 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-346-Data.db/(0,886424157) progress=0/886424157 - 0% /var/lib/cassandra/data/Attrs/TestAttrs-e-353-Data.db/(0,889222127) progress=0/889222127 - 0% Streaming from: /X.20 Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-385-Data.db/(0,357220526) progress=0/357220526 - 0% Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-389-Data.db/(0,371718620) progress=0/371718620 - 0% Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-369-Data.db/(0,888814566) progress=0/888814566 - 0% Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-374-Data.db/(0,889095219) progress=0/889095219 - 0% Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-330-Data.db/(0,307921317) progress=0/307921317 - 0% Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-331-Data.db/(0,13323957368) progress=0/13323957368 - 0% Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-383-Data.db/(0,28216239303) progress=0/28216239303 - 0% Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-319-Data.db/(0,14172830870) progress=0/14172830870 - 0% Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-366-Data.db/(0,899103394) progress=0/899103394 - 0% Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-367-Data.db/(0,894019840) progress=0/894019840 - 0% Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-390-Data.db/(0,185277927) progress=0/185277927 - 0% Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-378-Data.db/(0,896956312) progress=0/896956312 - 0% Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-372-Data.db/(0,893683568) progress=0/893683568 - 0% Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-386-Data.db/(0,182528797) progress=0/182528797 - 0% Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-333-Data.db/(0,22306924) progress=0/22306924 - 0% Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-376-Data.db/(0,897417147) progress=0/897417147 - 0% Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-283-Data.db/(0,8939407316) progress=0/8939407316 - 0% Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-381-Data.db/(0,908075169) progress=0/908075169 - 0% Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-373-Data.db/(0,892116147) progress=0/892116147 - 0% Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-377-Data.db/(0,898165901) progress=0/898165901 - 0% Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-332-Data.db/(0,219682197079) progress=0/219682197079 - 0% Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-382-Data.db/(0,784362565) progress=0/784362565 - 0% Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-375-Data.db/(0,893034298) progress=0/893034298 - 0% Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-380-Data.db/(0,901377643) progress=0/901377643 - 0% Nothing streaming from /10.5.5.22 Pool Name Active Pending Completed Commands n/a 0 433633667 Responses n/a 0 402612386