> I changed logging to debug level, but still nothing is logged. > Again - any help will be appreciated. There is nothing at the ERROR level on any machine ?
check nodetool compactionstats to see if a validation compaction is running, the repair may be waiting on this. check nodetool netstats to see if streams are being exchanged, then check the logs on those machines. cheers ----------------- Aaron Morton New Zealand @aaronmorton Co-Founder & Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 4/12/2013, at 10:24 pm, Tamar Rosen <ta...@correlor.com> wrote: > Update - I am still experiencing the above issues, but not all the time. I > was able to run repair (on this keyspace) from node 2 and from node 4, but > now a different keyspace hangs on these nodes, and I am still not able to run > repair on node 1. It seems random. I changed logging to debug level, but > still nothing is logged. > Again - any help will be appreciated. > > Tamar > > > On Mon, Dec 2, 2013 at 11:53 AM, Tamar Rosen <ta...@correlor.com> wrote: > Hi, > > On AWS, we had a 2 node cluster with RF 2. > We added 2 more nodes, then changed RF to 3 on all our keyspaces. > Next step was to run nodetool repair, node by node. > (In the meantime, we found that we must use CL quorum, which is affecting > our application's performance). > Started with node 1, which is one of the old nodes. > Ran: > nodetool repair -pr > > It seemed to be progressing fine, running keyspace by keyspace, for about an > hour, but then it hung. The last messages in the output are: > > [2013-12-01 11:18:24,577] Repair command #4 finished > [2013-12-01 11:18:24,594] Starting repair command #5, repairing 230 ranges > for keyspace correlor_customer_766 > > It stayed like this for almost 24 hours. Then we read about the possibility > of this being related to not upgrading sstables, so we killed the process. We > were not sure whether we had run upgrade sstables (we upgraded from 1.2.4 a > couple of months ago) > > So: > Ran upgradesstables on a specific table in the keyspace that repair got stuck > on. (this was fast) > nodetool upgradesstables correlor_customer_766 users > Ran repair on that same table. > nodetool repair correlor_customer_766 users -pr > > This is again hanging. > The first and only output from this process is: > [2013-12-02 08:22:41,221] Starting repair command #6, repairing 230 ranges > for keyspace correlor_customer_766 > > Nothing else happened for more than an hour. > > Any help and advice will be greatly appreciated. > > Tamar Rosen > > correlor.com > > > > > >