RE: [EXTERNAL] Re: Cassandra repair process in Low Bandwidth Network

Mohapatra, Kishore Fri, 15 Sep 2017 12:29:03 -0700

Hi Jeff,
                      Thanks for your reply.
Infact I have tried with all the options.


  1.  We use Cassandra reaper for our repair, which does the sub range repair.
  2.  I have also developed a shell script, which exactly does the same, as 
what reaper does. But this can control, how many repair session will run 
concurrently.
  3.  Also tried with full repair.
  4.  Tried running repair in two DCs at a time. While the repair between DC1 
And DC2 goes fine, but repair between DC1 and DC3 or between DC2 and DC3 fails.

So I will try setting inter-dc stream thruput to 20Mbps and see how that goes.

Is there anything else that could be done in this case ?

Thanks

Kishore Mohapatra
Principal Operations DBA
Seattle, WA
Email : [email protected]<mailto:[email protected]>


From: Jeff Jirsa [mailto:[email protected]]
Sent: Friday, September 15, 2017 10:27 AM
To: cassandra <[email protected]>
Subject: [EXTERNAL] Re: Cassandra repair process in Low Bandwidth Network

Hi Kishore,

Just to make sure we're all on the same page, I presume you're doing full 
repairs using something like 'nodetool repair -pr', which repairs all data for 
a given token range across all of your hosts in all of your dcs. Is that a 
correct assumption to start?

In addition to throttling inter-dc stream throughput (which you should be able 
to set quite low - perhaps as low as 20 Mbps), you may also want to consider 
smaller ranges (using a concept we call subrange repair, where instead of using 
-pr, you pass -st and -et - which is what tools like 
http://cassandra-reaper.io/<https://urldefense.proofpoint.com/v2/url?u=http-3A__cassandra-2Dreaper.io_&d=DwMFaQ&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=O20_rcIS1QazTO3_J10I1cPIygxnuBZ4sUCz1TS16XE&m=rNdSqNv4gpfoluDbS5uGdjDRj6zcJVHGYOSaJyl7FmQ&s=SiggeMxLLmJXEW7ljC48Lap4qov05ZvEuRJ_ybaxffI&e=>
 do ) - this will keep streams smaller (in terms of total bytes transferred per 
streaming session, though you'll have more sessions). Finally, you can use 
-host and -dc options to limit repair so that sessions don't always hit all 3 
dcs - for exactly, you could do a repair between DC1 and DC2 using -dc, then do 
a repair of DC1 and DC3 using -dc - it's a lot more coordination required, but 
likely helps cut down on the traffic over your VPN link.


On Fri, Sep 15, 2017 at 9:09 AM, Mohapatra, Kishore 
<[email protected]<mailto:[email protected]>> wrote:

Hi,
       we have a cassandra cluster with 7 nodes each in 3 datacenters. We are 
using C* 2.1.15.4 version.
Network bandwidth between DC1 and DC2 is very good (10Gbit/s) and a dedicated 
one. However network pipe between DC1 and DC3 and between DC2 and DC3 is very 
poor and has only 100 MBit/s and also goes thru VPN network. Each node contains 
about 100 Gb of data and has a RF of 3. Whenever we run the repair, it fails 
with streaming errors and never completes. I have already tried the streaming 
timeout parameter to a very high value. But it did not help. I could repair 
either just in the local dc or just the first two DCs. Can not repair DC3 when 
i combine with the other two DCs.

So how can i successfully repair the keyspace in these kind of environments ?

I see that there is a parameter to throttle the inter-dc stream thruput, which 
default to 200 MBit/s. So what is the minimum threshold that i could set it to 
without affecting the cluster ?

Is there any other way to work in these kind of environments ?
I will appreciate your feedback and help on this.


Thanks

Kishore Mohapatra
Principal Operations DBA
Seattle, WA
Email : [email protected]<mailto:[email protected]>

RE: [EXTERNAL] Re: Cassandra repair process in Low Bandwidth Network

Reply via email to