Hi,

On AWS, we had a 2 node cluster with RF 2.
We added 2 more nodes, then changed RF to 3 on all our keyspaces.
Next step was to run nodetool repair, node by node.
(In the meantime, we found that we must use  CL quorum, which is affecting
our application's performance).
Started with node 1, which is one of the old nodes.
Ran:
nodetool repair -pr

It seemed to be progressing fine, running keyspace by keyspace, for about
an hour, but then it hung. The last messages in the output are:

[2013-12-01 11:18:24,577] Repair command #4 finished
[2013-12-01 11:18:24,594] Starting repair command #5, repairing 230 ranges
for keyspace correlor_customer_766

It stayed like this for almost 24 hours. Then we read about the possibility
of this being related to not upgrading
sstables<http://comments.gmane.org/gmane.comp.db.cassandra.user/31939>,
so we killed the process. We were not sure whether we had run upgrade
sstables (we upgraded from 1.2.4 a couple of months ago)

So:
Ran upgradesstables on a specific table in the keyspace that repair got
stuck on. (this was fast)
nodetool upgradesstables correlor_customer_766 users
Ran repair on that same table.
nodetool repair correlor_customer_766 users -pr

This is again hanging.
The first and only output from this process is:
[2013-12-02 08:22:41,221] Starting repair command #6, repairing 230 ranges
for keyspace correlor_customer_766

Nothing else happened for more than an hour.

Any help and advice will be greatly appreciated.

Tamar Rosen

correlor.com

Reply via email to