Re: Never ending manual repair after adding second DC

aaron morton Mon, 16 Jul 2012 03:26:14 -0700

> Now, pretty much every single scenario points towards connectivity
> problem, however we also have few PostgreSQL replication streams
In the before time someone had problems with a switch/router that was dropping 
persistent but idle connections. Doubt this applies, and it would probably 
result in an error, just throwing it out there.


Have you combed through the logs logging for errors or warnings ?

I would repair a single small CF with -pr and watch closely. Consider setting 
DEBUG logging (you can do it via JMX) 

org.apache.cassandra.service.AntiEntropyService                         <- the 
class the manages repair
org.apache.cassandra.streaming                                                  
<- package that handles streaming

There was a fix to repair in 1.0.11 but that has to do with streaming 
https://github.com/apache/cassandra/blob/cassandra-1.0/CHANGES.txt#L5

Good luck. 

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 13/07/2012, at 10:16 PM, Bart Swedrowski wrote:

> Hello everyone,
> 
> I'm facing quite weird problem with Cassandra since we've added
> secondary DC to our cluster and have totally ran out of ideas; this
> email is a call for help/advice!
> 
> History looks like:
> - we used to have 4 nodes in a single DC
> - running Cassandra 0.8.7
> - RF:3
> - around 50GB of data on each node
> - randomPartitioner and SimpleSnitch
> 
> All was working fine for over 9 months.  Few weeks ago we decided we
> want to add another 4 nodes in a second DC and join them to the
> cluster.  Prior doing that, we upgraded Cassandra to 1.0.9 to push it
> out of the doors before the multi-DC work.  After upgrade, we left it
> working for over a week and it was all good; no issues.
> 
> Then, we added 4 additional nodes in another DC bringing the cluster
> to 8 nodes in total, spreading across two DCs, so now we've:
> - 8 nodes across 2 DCs, 4 in each DC
> - 100Mbps low-latency connection (sub 5ms) running over Cisco ASA
> Site-to-Site VPN (which is ikev1 based)
> - DC1:3,DC2:3 RFs
> - randomPartitioner and using PropertyFileSnitch now
> 
> nodetool ring looks as follows:
> $ nodetool -h localhost ring
> Address         DC          Rack        Status State   Load
> Owns    Token
> 
>        148873535527910577765226390751398592512
> 192.168.81.2    DC1         RC1         Up     Normal  37.9 GB
> 12.50%  0
> 192.168.81.3    DC1         RC1         Up     Normal  35.32 GB
> 12.50%  21267647932558653966460912964485513216
> 192.168.81.4    DC1         RC1         Up     Normal  39.51 GB
> 12.50%  42535295865117307932921825928971026432
> 192.168.81.5    DC1         RC1         Up     Normal  19.42 GB
> 12.50%  63802943797675961899382738893456539648
> 192.168.94.178  DC2         RC1         Up     Normal  40.72 GB
> 12.50%  85070591730234615865843651857942052864
> 192.168.94.179  DC2         RC1         Up     Normal  30.42 GB
> 12.50%  106338239662793269832304564822427566080
> 192.168.94.180  DC2         RC1         Up     Normal  30.94 GB
> 12.50%  127605887595351923798765477786913079296
> 192.168.94.181  DC2         RC1         Up     Normal  12.75 GB
> 12.50%  148873535527910577765226390751398592512
> 
> (please ignore the fact that nodes are not interleaved; they should be
> however there's been hiccup during the implementation phase.  Unless
> *this* is the problem!)
> 
> Now, the problem: over 7 out of 10 manual repairs are not being
> finished.  They usually get stuck and show 3 different sympoms:
> 
>  1). Say node 192.168.81.2 runs manual repair, it requests merkle
> trees from 192.168.81.2, 192.168.81.3, 192.168.81.5, 192.168.94.178,
> 192.168.94.179, 192.168.94.181.  It receives them from 192.168.81.2,
> 192.168.81.3, 192.168.81.5, 192.168.94.178, 192.168.94.179 but not
> from 192.168.94.181.  192.168.94.181 logs are saying that it has sent
> the merkle tree back but it's never received by 192.168.81.2.
>  2). Say node 192.168.81.2 runs manual repair, it requests merkle
> trees from 192.168.81.2, 192.168.81.3, 192.168.81.5, 192.168.94.178,
> 192.168.94.179, 192.168.94.181.  It receives them from 192.168.81.2,
> 192.168.81.3, 192.168.81.5, 192.168.94.178, 192.168.94.179 but not
> from 192.168.94.181.  192.168.94.181 logs are not saying *anything*
> about merkle tree being sent.  Also compactionstats are not even
> saying anything about them being validated (generated)
>  3). Merkle trees are being delivered, and nodes are sending data
> across to sync theirselves.  On a certain occasions, they'll get
> "stuck" streaming files between each other at 100% and won't move
> forward.  Now the interesting bit is, the ones that are getting stuck
> are always placed in different DCs!
> 
> Now, pretty much every single scenario points towards connectivity
> problem, however we also have few PostgreSQL replication streams
> happening over this connection, some other traffic going over and
> quite a lot of monitoring happening and none of those are being
> affected in any way.
> 
> Also, if random packets are being lost, I'd expect TCP to correct that
> (re-transmit them).
> 
> It doesn't matter whether its manual repair or just -pr repair, both
> end with pretty much the same.
> 
> Has anyone came across this kind of issue before or have any ideas how
> else I could investigate this?  The issue is pressing me massively as
> this is our live cluster and I've to run manual repairs pretty much
> manually (usually multiple times before it finally goes through) every
> single day…  And also I'm not sure whether cluster is getting affected
> in any other way BTW.
> 
> I've gone through Jira issues and considered upgrading to 1.1.X but I
> can't see anything that would even look like something that is
> happening to my cluster.
> 
> If any further information, like logs, configuration files are needed,
> please let me know.
> 
> Any informations, suggestions, advices - greatly appreciated.
> 
> Kind regards,
> Bart

Re: Never ending manual repair after adding second DC

Reply via email to