Hi,

I'm currently testing the restore of a Cassandra 1.1.2 snapshot.

The steps to reproduce the problem:

 - snapshot a 3-node production cluster (1.1.2) with RF=3 and LCS (leveled 
compaction) ==> 8GB data/node
 - create a new 3-node cluster (node1,2,3)
 - stop node1 / copy data (SSTables) from the snapshot (just one node) / start 
node1
 - Cassandra is opening 1185 SSTable files (*-hd-XXXX),  pending compaction 
tasks: 247
 - before Cassandra is starting compactions RUN:  nodetool repair -pr

The error messages in system.log :

 INFO [AntiEntropySessions:1] 2012-07-20 10:53:16,743 AntiEntropyService.java 
(line 666) [repair #1c59b930-d259-11e1-0000-a0b0843ee1fe] new session: will 
sync /10.241.65.232, /10.54.26.250, /10.251.33.166 on range 
(113427455640312821154458202477256070485,0] for highscores.[highscore]
 INFO [AntiEntropySessions:1] 2012-07-20 10:53:16,747 AntiEntropyService.java 
(line 871) [repair #1c59b930-d259-11e1-0000-a0b0843ee1fe] requesting merkle 
trees for highscore (to [/10.54.26.250, /10.251.33.166, /10.241.65.232])
 INFO [AntiEntropyStage:1] 2012-07-20 10:53:17,085 AntiEntropyService.java 
(line 206) [repair #1c59b930-d259-11e1-0000-a0b0843ee1fe] Received merkle tree 
for highscore from /10.54.26.250
 INFO [AntiEntropyStage:1] 2012-07-20 10:53:17,104 AntiEntropyService.java 
(line 206) [repair #1c59b930-d259-11e1-0000-a0b0843ee1fe] Received merkle tree 
for highscore from /10.251.33.166
ERROR [ValidationExecutor:1] 2012-07-20 10:53:17,865 
AbstractCassandraDaemon.java (line 134) Exception in thread 
Thread[ValidationExecutor:1,1,main]
java.lang.StackOverflowError
        at com.google.common.collect.Sets$1.iterator(Sets.java:578)    ....  
(repeating 1024 times) 

The repair command does not return. 
The repair command increases the Active/Pending counters of 
"AntiEntropySessions" in tpstats. 
The counters never go back to 0.

After some time compaction starts as usual w/o problems.

Am I doing something wrong? The error is bound to LCS. No problem with STCS.
There is plenty of space in Java HEAP (7G) and on the disk (1.7TB). 
RAM is 15G and SWAP is 20G. This is an Amazon m1.xlarge instance with 
Ubuntu/Lucid Linux.

Thanks for any hints or help,
Rudolf VanderLeeden
Scoreloop/RIM

Reply via email to