Michael, thanks for the input. I don't think I'm going to need to upgrade to 3.11 for the sake of getting nodetool repair working for me. Instead, I have another plausible explanation and solution for my particular situation.
First, I should say that disk usage proved to be a red herring. There was plenty of disk space available. When I said that the error message I was seeing was no more precise than "Some repair failed," I misstated things. Just above that error message was another further detail: "Validation failed in /(IP address of host)." Of course, that's still vague. What validation failed? However, that extra information led me to this JIRA ticket: https://issues.apache.org/jira/browse/CASSANDRA-10057. In particular this comment: "If you invoke repair on multiple node at once, this can be happen. Can you confirm? And once it happens, the error will continue unless you restart the node since some resources remain due to the hang. I will post the patch not to hang." Now, the particular symptom to which that response refers is not what I was seeing, but the response got me thinking that perhaps the failures I was getting were on account of attempting to run "nodetool repair --partitioner-range" simultaneously on all the nodes in my cluster. These are only three-node dev clusters, and what I would see is that the repair would pass on one node but fail on the other two. So I tried running the repairs sequentially on each of the nodes. With this change the repair works, and I have every expectation that it will continue to work--that running repair sequentially is the solution to my particular problem. If this is the case and repairs are intended to be run sequentially, then that constitutes a contract change for nodetool repair. This is the first time I'm running a repair on a multi-node cluster on Cassandra 3.10, and only with 3.10 was I seeing this problem. I'd never seen it previously running repairs on Cassandra 2.1 clusters, which is what I was upgrading from. The last comment in that particular JIRA ticket is coming from someone reporting the same problem I'm seeing, and their experience indirectly corroborates mine, or at least it doesn't contradict mine. On Thu, Jul 27, 2017 at 10:26 AM, Michael Shuler <mich...@pbandjelly.org> wrote: > On 07/27/2017 12:10 PM, Mitch Gitman wrote: > > I'm using Apache Cassandra 3.10. > <snip> > > this is a dev cluster I'm talking about. > <more snippage> > > Further insights welcome... > > Upgrade and see if one of the many fixes for 3.11.0 helped? > > https://github.com/apache/cassandra/blob/cassandra-3.11. > 0/CHANGES.txt#L1-L129 > > If you can reproduce on 3.11.0, hit JIRA with the steps to repro. There > are several bug fixes committed to the cassandra-3.11 branch, pending a > 3.11.1 release, but I don't see one that's particularly relevant to your > trace. > > https://github.com/apache/cassandra/blob/cassandra-3.11/CHANGES.txt > > -- > Kind regards, > Michael > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > >