The repair results is following (we run it Friday): Cannot proceed on repair because a neighbor (/192.168.61.201) is dead: session failed
But to be honest the neighbor did not died. It seemed to trigger a series of full GC events on the initiating node. The results form logs are: [2015-02-20 16:47:54,884] Starting repair command #2, repairing 7 ranges for keyspace prem_maelstrom_2 (parallelism=PARALLEL, full=false) [2015-02-21 02:21:55,640] Lost notification. You should check server log for repair status of keyspace prem_maelstrom_2 [2015-02-21 02:22:55,642] Lost notification. You should check server log for repair status of keyspace prem_maelstrom_2 [2015-02-21 02:23:55,642] Lost notification. You should check server log for repair status of keyspace prem_maelstrom_2 [2015-02-21 02:24:55,644] Lost notification. You should check server log for repair status of keyspace prem_maelstrom_2 [2015-02-21 04:41:08,607] Repair session d5d01dd0-b917-11e4-bc97-e9a66e5b2124 for range (85070591730234615865843651857942052874,102084710076281535261119195933814292480] failed with error org.apache.cassandra.exceptions.RepairException: [repair #d5d01dd0-b917-11e4-bc97-e9a66e5b2124 on prem_maelstrom_2/customer_events, (85070591730234615865843651857942052874,102084710076281535261119195933814292480]] Sync failed between /192.168.71.196 and /192.168.61.199 [2015-02-21 04:41:08,608] Repair session eb8d8d10-b967-11e4-bc97-e9a66e5b2124 for range (68056473384187696470568107782069813248,85070591730234615865843651857942052874] failed with error java.io.IOException: Endpoint /192.168.61.199 died [2015-02-21 04:41:08,608] Repair session c48aef00-b971-11e4-bc97-e9a66e5b2124 for range (0,10] failed with error java.io.IOException: Cannot proceed on repair because a neighbor (/ 192.168.61.201) is dead: session failed [2015-02-21 04:41:08,609] Repair session c48d38f0-b971-11e4-bc97-e9a66e5b2124 for range (42535295865117307932921825928971026442,68056473384187696470568107782069813248] failed with error java.io.IOException: Cannot proceed on repair because a neighbor (/192.168.61.201) is dead: session failed [2015-02-21 04:41:08,609] Repair session c48d38f1-b971-11e4-bc97-e9a66e5b2124 for range (127605887595351923798765477786913079306,136112946768375392941136215564139626496] failed with error java.io.IOException: Cannot proceed on repair because a neighbor (/192.168.61.201) is dead: session failed [2015-02-21 04:41:08,619] Repair session c48d6000-b971-11e4-bc97-e9a66e5b2124 for range (136112946768375392941136215564139626496,0] failed with error java.io.IOException: Cannot proceed on repair because a neighbor (/ 192.168.61.201) is dead: session failed [2015-02-21 04:41:08,620] Repair session c48d6001-b971-11e4-bc97-e9a66e5b2124 for range (102084710076281535261119195933814292480,127605887595351923798765477786913079306] failed with error java.io.IOException: Cannot proceed on repair because a neighbor (/192.168.61.201) is dead: session failed [2015-02-21 04:41:08,620] Repair command #2 finished We tried to run repair one more time. After 24 hour have some streaming errors. Moreover we have to stop it because we start to have write timeouts on client :( We check iostat when we have write timeouts. Example from one node in DC_A are here: The file also contains tpstats from all nodes.Nodes starting with "z" are in DC_B, rest is in DC_A Cassandra is data and commit log are on disk dm-XX. I also read http://jonathanhui.com/cassandra-performance-tuning-and-monitoring and I think about: 1) memtable configuration - do you have some suggestion? 2) run INSERT in batch statements - I am not sure if this reduce IO, again do you have experience with this? Any tips will be helpful Regards Piotrek On Thu, Feb 19, 2015 at 10:34 AM, Roland Etzenhammer < r.etzenham...@t-online.de> wrote: > Hi, > > 2.1.3 is now the official latest release - I checked this morning and got > this good surprise. Now it's update time - thanks to all guys involved, if > I meet anyone one beer from me :-) > > The changelist is rather long: > https://git1-us-west.apache.org/repos/asf?p=cassandra.git; > a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-2.1.3 > > Hopefully that will solve many of those oddities and not invent to much > new ones :-) > > Cheers, > Roland > > >