The repair results is following (we run it Friday): Cannot proceed on
repair because a neighbor (/192.168.61.201) is dead: session failed

But to be honest the neighbor did not died. It seemed to trigger a series
of full GC events on the initiating node. The results form logs are:

[2015-02-20 16:47:54,884] Starting repair command #2, repairing 7 ranges
for keyspace prem_maelstrom_2 (parallelism=PARALLEL, full=false)
[2015-02-21 02:21:55,640] Lost notification. You should check server log
for repair status of keyspace prem_maelstrom_2
[2015-02-21 02:22:55,642] Lost notification. You should check server log
for repair status of keyspace prem_maelstrom_2
[2015-02-21 02:23:55,642] Lost notification. You should check server log
for repair status of keyspace prem_maelstrom_2
[2015-02-21 02:24:55,644] Lost notification. You should check server log
for repair status of keyspace prem_maelstrom_2
[2015-02-21 04:41:08,607] Repair session
d5d01dd0-b917-11e4-bc97-e9a66e5b2124 for range
(85070591730234615865843651857942052874,102084710076281535261119195933814292480]
failed with error org.apache.cassandra.exceptions.RepairException: [repair
#d5d01dd0-b917-11e4-bc97-e9a66e5b2124 on prem_maelstrom_2/customer_events,
(85070591730234615865843651857942052874,102084710076281535261119195933814292480]]
Sync failed between /192.168.71.196 and /192.168.61.199
[2015-02-21 04:41:08,608] Repair session
eb8d8d10-b967-11e4-bc97-e9a66e5b2124 for range
(68056473384187696470568107782069813248,85070591730234615865843651857942052874]
failed with error java.io.IOException: Endpoint /192.168.61.199 died
[2015-02-21 04:41:08,608] Repair session
c48aef00-b971-11e4-bc97-e9a66e5b2124 for range (0,10] failed with error
java.io.IOException: Cannot proceed on repair because a neighbor (/
192.168.61.201) is dead: session failed
[2015-02-21 04:41:08,609] Repair session
c48d38f0-b971-11e4-bc97-e9a66e5b2124 for range
(42535295865117307932921825928971026442,68056473384187696470568107782069813248]
failed with error java.io.IOException: Cannot proceed on repair because a
neighbor (/192.168.61.201) is dead: session failed
[2015-02-21 04:41:08,609] Repair session
c48d38f1-b971-11e4-bc97-e9a66e5b2124 for range
(127605887595351923798765477786913079306,136112946768375392941136215564139626496]
failed with error java.io.IOException: Cannot proceed on repair because a
neighbor (/192.168.61.201) is dead: session failed
[2015-02-21 04:41:08,619] Repair session
c48d6000-b971-11e4-bc97-e9a66e5b2124 for range
(136112946768375392941136215564139626496,0] failed with error
java.io.IOException: Cannot proceed on repair because a neighbor (/
192.168.61.201) is dead: session failed
[2015-02-21 04:41:08,620] Repair session
c48d6001-b971-11e4-bc97-e9a66e5b2124 for range
(102084710076281535261119195933814292480,127605887595351923798765477786913079306]
failed with error java.io.IOException: Cannot proceed on repair because a
neighbor (/192.168.61.201) is dead: session failed
[2015-02-21 04:41:08,620] Repair command #2 finished


We tried to run repair one more time. After 24 hour have some streaming
errors. Moreover we have to stop it because we start to have write timeouts
on client :(

We check iostat when we have write timeouts. Example from one node in DC_A
are here:
The file also contains tpstats from all nodes.Nodes starting with "z" are
in DC_B, rest is in DC_A
Cassandra is data and commit log are on disk dm-XX.

I also read
http://jonathanhui.com/cassandra-performance-tuning-and-monitoring and I
think about:
1) memtable configuration - do you have some suggestion?
2) run INSERT in batch statements - I am not sure if this reduce IO, again
do you have experience with this?

Any tips will be helpful

Regards
Piotrek

On Thu, Feb 19, 2015 at 10:34 AM, Roland Etzenhammer <
r.etzenham...@t-online.de> wrote:

> Hi,
>
> 2.1.3 is now the official latest release - I checked this morning and got
> this good surprise. Now it's update time - thanks to all guys involved, if
> I meet anyone one beer from me :-)
>
> The changelist is rather long:
> https://git1-us-west.apache.org/repos/asf?p=cassandra.git;
> a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-2.1.3
>
> Hopefully that will solve many of those oddities and not invent to much
> new ones :-)
>
> Cheers,
> Roland
>
>
>

Reply via email to