Hi, I'm trying to run a repair on a node my Cassandra cluster, version 3.7, and was hoping someone may be able to shed light on an error message that keeps cropping up.
I started the repair on a node after discovering that it somehow became partitioned from the rest of the cluster, e.g. nodetool status on all other nodes showed it as DN, and on the node itself showed all other nodes as DN. After restarting the Cassandra daemon the node seemed to re-join the cluster just fine, so I began a repair. The repair has been running for about 33 hours (first incremental repair on this cluster), and every so often I'll see a line like this: [2017-08-31 00:18:16,300] Repair session f7ae4e71-8ce3-11e7-b466-79eba0383e4f for range [(-5606588017314999649,-5604469721630340065], (9047587767449433379,9047652965163017217]] failed with error Endpoint / 20.0.122.204 died (progress: 9%) Every one of these lines refers to the same node, 20.0.122.204. I'm mostly looking for guidance here. Do these errors indicate that the entire repair will be worthless, or just for token ranges shared by these two nodes? Is it normal to see error messages of this nature and for a repair not to terminate? Thanks, Paul