> 1. is this a nodetool bug? is there any way to propagate the > java.io.IOException back to nodetool? The repair continues to work even if nodetool fails, it's a server side thing.
> 2. network problems on EC2, I'm shocked! are there recommended > network settings for EC2? Streaming does not put a timeout on the socket, in this case check the 10.82.233.59 node to see why the pipe broke. Cheers ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 13/03/2013, at 4:28 PM, Dane Miller <d...@optimalsocial.com> wrote: > On Wed, Mar 13, 2013 at 12:39 PM, Wei Zhu <wz1...@yahoo.com> wrote: >> My guess would be there is some exception during the repair and your session >> is aborted. >> Here is the code of doing repair: >> >> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/AntiEntropyService.java >> >> looking for >> >> logger.info >> >> Compare that with your log file, it should give you a rough idea in which >> stage repaired died. > > Thanks for the link to the source. That's a little hard to grok, but > your suggestion to examine the logs more thoroughly was helpful. I > was able to determine that repair hung due to connection errors during > streaming. I'll include log snippets below, but this leads me to > other more important questions... > > 1. is this a nodetool bug? is there any way to propagate the > java.io.IOException back to nodetool? > 2. network problems on EC2, I'm shocked! are there recommended > network settings for EC2? > > Dane > > Here are the relevant logs showing (A) repair progress, and (B) > java.io.IOExceptions > > (A) repair progress > INFO [Thread-5314] 2013-03-11 23:29:28,866 StorageService.java (line > 2364) Starting repair command #9, repairing 1 ranges for keyspace > OpsCenter > INFO [AntiEntropySessions:13] 2013-03-11 23:29:28,867 > AntiEntropyService.java (line 652) [repair > #84e86020-8aa3-11e2-abb2-17112e360b9a] new session: will sync > /10.34.37.195, /10.82.233.59 on range > (0,28356863910078205288614550619314017621] for OpsCenter.[events, > rollups60, settings, pdps, rollups86400, events_timeline, rollups300, > rollups7200] > INFO [Thread-5320] 2013-03-11 23:29:29,198 AntiEntropyService.java > (line 765) [repair #84e86020-8aa3-11e2-abb2-17112e360b9a] events is > fully synced (7 remaining column family to sync for this session) > INFO [AntiEntropyStage:1] 2013-03-11 23:38:02,198 > AntiEntropyService.java (line 765) [repair > #84e86020-8aa3-11e2-abb2-17112e360b9a] settings is fully synced (6 > remaining column family to sync for this session) > INFO [AntiEntropyStage:1] 2013-03-11 23:38:02,617 > AntiEntropyService.java (line 765) [repair > #84e86020-8aa3-11e2-abb2-17112e360b9a] pdps is fully synced (5 > remaining column family to sync for this session) > INFO [Streaming to /10.82.233.59:34] 2013-03-11 23:38:12,491 > AntiEntropyService.java (line 765) [repair > #84e86020-8aa3-11e2-abb2-17112e360b9a] rollups86400 is fully synced (4 > remaining column family to sync for this session) > INFO [Streaming to /10.82.233.59:36] 2013-03-11 23:39:55,886 > AntiEntropyService.java (line 765) [repair > #84e86020-8aa3-11e2-abb2-17112e360b9a] rollups7200 is fully synced (3 > remaining column family to sync for this session) > > > (B) java.io.IOException > # grep -A1 ERROR /var/log/cassandra/system.log.2 > ERROR [Streaming to /10.82.233.59:34] 2013-03-11 23:38:12,654 > CassandraDaemon.java (line 132) Exception in thread Thread[Streaming > to /10.82.233.59:34,5,main] > java.lang.RuntimeException: java.io.IOException: Connection reset by peer > -- > ERROR [Streaming to /10.82.233.59:35] 2013-03-11 23:38:12,692 > CassandraDaemon.java (line 132) Exception in thread Thread[Streaming > to /10.82.233.59:35,5,main] > java.lang.RuntimeException: java.io.IOException: Broken pipe > -- > ERROR [Streaming to /10.82.233.59:36] 2013-03-11 23:39:55,932 > CassandraDaemon.java (line 132) Exception in thread Thread[Streaming > to /10.82.233.59:36,5,main] > java.lang.RuntimeException: java.io.IOException: Broken pipe