On Wed, Mar 13, 2013 at 12:39 PM, Wei Zhu <wz1...@yahoo.com> wrote:
> My guess would be there is some exception during the repair and your session 
> is aborted.
> Here is the code of doing repair:
>
>https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/AntiEntropyService.java
>
> looking for
>
> logger.info
>
> Compare that with your log file, it should give you a rough idea in which 
> stage repaired died.

Thanks for the link to the source.  That's a little hard to grok, but
your suggestion to examine the logs more thoroughly was helpful.  I
was able to determine that repair hung due to connection errors during
streaming.  I'll include log snippets below, but this leads me to
other more important questions...

1. is this a nodetool bug?  is there any way to propagate the
java.io.IOException back to nodetool?
2. network problems on EC2, I'm shocked!  are there recommended
network settings for EC2?

Dane

Here are the relevant logs showing (A) repair progress, and (B)
java.io.IOExceptions

(A) repair progress
INFO [Thread-5314] 2013-03-11 23:29:28,866 StorageService.java (line
2364) Starting repair command #9, repairing 1 ranges for keyspace
OpsCenter
 INFO [AntiEntropySessions:13] 2013-03-11 23:29:28,867
AntiEntropyService.java (line 652) [repair
#84e86020-8aa3-11e2-abb2-17112e360b9a] new session: will sync
/10.34.37.195, /10.82.233.59 on range
(0,28356863910078205288614550619314017621] for OpsCenter.[events,
rollups60, settings, pdps, rollups86400, events_timeline, rollups300,
rollups7200]
 INFO [Thread-5320] 2013-03-11 23:29:29,198 AntiEntropyService.java
(line 765) [repair #84e86020-8aa3-11e2-abb2-17112e360b9a] events is
fully synced (7 remaining column family to sync for this session)
 INFO [AntiEntropyStage:1] 2013-03-11 23:38:02,198
AntiEntropyService.java (line 765) [repair
#84e86020-8aa3-11e2-abb2-17112e360b9a] settings is fully synced (6
remaining column family to sync for this session)
 INFO [AntiEntropyStage:1] 2013-03-11 23:38:02,617
AntiEntropyService.java (line 765) [repair
#84e86020-8aa3-11e2-abb2-17112e360b9a] pdps is fully synced (5
remaining column family to sync for this session)
 INFO [Streaming to /10.82.233.59:34] 2013-03-11 23:38:12,491
AntiEntropyService.java (line 765) [repair
#84e86020-8aa3-11e2-abb2-17112e360b9a] rollups86400 is fully synced (4
remaining column family to sync for this session)
 INFO [Streaming to /10.82.233.59:36] 2013-03-11 23:39:55,886
AntiEntropyService.java (line 765) [repair
#84e86020-8aa3-11e2-abb2-17112e360b9a] rollups7200 is fully synced (3
remaining column family to sync for this session)


(B) java.io.IOException
# grep -A1 ERROR /var/log/cassandra/system.log.2
ERROR [Streaming to /10.82.233.59:34] 2013-03-11 23:38:12,654
CassandraDaemon.java (line 132) Exception in thread Thread[Streaming
to /10.82.233.59:34,5,main]
java.lang.RuntimeException: java.io.IOException: Connection reset by peer
--
ERROR [Streaming to /10.82.233.59:35] 2013-03-11 23:38:12,692
CassandraDaemon.java (line 132) Exception in thread Thread[Streaming
to /10.82.233.59:35,5,main]
java.lang.RuntimeException: java.io.IOException: Broken pipe
--
ERROR [Streaming to /10.82.233.59:36] 2013-03-11 23:39:55,932
CassandraDaemon.java (line 132) Exception in thread Thread[Streaming
to /10.82.233.59:36,5,main]
java.lang.RuntimeException: java.io.IOException: Broken pipe

Reply via email to