> I changed logging to debug level, but still nothing is logged. 
> Again - any help will be appreciated. 
There is nothing at the ERROR level on any machine ?

check nodetool compactionstats to see if a validation compaction is running, 
the repair may be waiting on this. 

check nodetool netstats to see if streams are being exchanged, then check the 
logs on those machines. 

cheers

-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 4/12/2013, at 10:24 pm, Tamar Rosen <ta...@correlor.com> wrote:

> Update - I am still experiencing the above issues, but not all the time. I 
> was able to run repair (on this keyspace) from node 2 and from node 4, but 
> now a different keyspace hangs on these nodes, and I am still not able to run 
> repair on node 1. It seems random. I changed logging to debug level, but 
> still nothing is logged. 
> Again - any help will be appreciated. 
> 
> Tamar
> 
> 
> On Mon, Dec 2, 2013 at 11:53 AM, Tamar Rosen <ta...@correlor.com> wrote:
> Hi,
> 
> On AWS, we had a 2 node cluster with RF 2. 
> We added 2 more nodes, then changed RF to 3 on all our keyspaces. 
> Next step was to run nodetool repair, node by node. 
> (In the meantime, we found that we must use  CL quorum, which is affecting 
> our application's performance).
> Started with node 1, which is one of the old nodes.
> Ran:
> nodetool repair -pr
> 
> It seemed to be progressing fine, running keyspace by keyspace, for about an 
> hour, but then it hung. The last messages in the output are:
>  
> [2013-12-01 11:18:24,577] Repair command #4 finished
> [2013-12-01 11:18:24,594] Starting repair command #5, repairing 230 ranges 
> for keyspace correlor_customer_766
> 
> It stayed like this for almost 24 hours. Then we read about the possibility 
> of this being related to not upgrading sstables, so we killed the process. We 
> were not sure whether we had run upgrade sstables (we upgraded from 1.2.4 a 
> couple of months ago)  
> 
> So:
> Ran upgradesstables on a specific table in the keyspace that repair got stuck 
> on. (this was fast)
> nodetool upgradesstables correlor_customer_766 users
> Ran repair on that same table. 
> nodetool repair correlor_customer_766 users -pr
> 
> This is again hanging. 
> The first and only output from this process is:
> [2013-12-02 08:22:41,221] Starting repair command #6, repairing 230 ranges 
> for keyspace correlor_customer_766
> 
> Nothing else happened for more than an hour. 
> 
> Any help and advice will be greatly appreciated.
> 
> Tamar Rosen
> 
> correlor.com
> 
> 
>   
> 
> 
> 

Reply via email to