Hi Jeff,
Thanks for your reply.
Infact I have tried with all the options.
1. We use Cassandra reaper for our repair, which does the sub range repair.
2. I have also developed a shell script, which exactly does the same, as
what reaper does. But this can control, how many repair session will run
concurrently.
3. Also tried with full repair.
4. Tried running repair in two DCs at a time. While the repair between DC1
And DC2 goes fine, but repair between DC1 and DC3 or between DC2 and DC3 fails.
So I will try setting inter-dc stream thruput to 20Mbps and see how that goes.
Is there anything else that could be done in this case ?
Thanks
Kishore Mohapatra
Principal Operations DBA
Seattle, WA
Email : [email protected]<mailto:[email protected]>
From: Jeff Jirsa [mailto:[email protected]]
Sent: Friday, September 15, 2017 10:27 AM
To: cassandra <[email protected]>
Subject: [EXTERNAL] Re: Cassandra repair process in Low Bandwidth Network
Hi Kishore,
Just to make sure we're all on the same page, I presume you're doing full
repairs using something like 'nodetool repair -pr', which repairs all data for
a given token range across all of your hosts in all of your dcs. Is that a
correct assumption to start?
In addition to throttling inter-dc stream throughput (which you should be able
to set quite low - perhaps as low as 20 Mbps), you may also want to consider
smaller ranges (using a concept we call subrange repair, where instead of using
-pr, you pass -st and -et - which is what tools like
http://cassandra-reaper.io/<https://urldefense.proofpoint.com/v2/url?u=http-3A__cassandra-2Dreaper.io_&d=DwMFaQ&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=O20_rcIS1QazTO3_J10I1cPIygxnuBZ4sUCz1TS16XE&m=rNdSqNv4gpfoluDbS5uGdjDRj6zcJVHGYOSaJyl7FmQ&s=SiggeMxLLmJXEW7ljC48Lap4qov05ZvEuRJ_ybaxffI&e=>
do ) - this will keep streams smaller (in terms of total bytes transferred per
streaming session, though you'll have more sessions). Finally, you can use
-host and -dc options to limit repair so that sessions don't always hit all 3
dcs - for exactly, you could do a repair between DC1 and DC2 using -dc, then do
a repair of DC1 and DC3 using -dc - it's a lot more coordination required, but
likely helps cut down on the traffic over your VPN link.
On Fri, Sep 15, 2017 at 9:09 AM, Mohapatra, Kishore
<[email protected]<mailto:[email protected]>> wrote:
Hi,
we have a cassandra cluster with 7 nodes each in 3 datacenters. We are
using C* 2.1.15.4 version.
Network bandwidth between DC1 and DC2 is very good (10Gbit/s) and a dedicated
one. However network pipe between DC1 and DC3 and between DC2 and DC3 is very
poor and has only 100 MBit/s and also goes thru VPN network. Each node contains
about 100 Gb of data and has a RF of 3. Whenever we run the repair, it fails
with streaming errors and never completes. I have already tried the streaming
timeout parameter to a very high value. But it did not help. I could repair
either just in the local dc or just the first two DCs. Can not repair DC3 when
i combine with the other two DCs.
So how can i successfully repair the keyspace in these kind of environments ?
I see that there is a parameter to throttle the inter-dc stream thruput, which
default to 200 MBit/s. So what is the minimum threshold that i could set it to
without affecting the cluster ?
Is there any other way to work in these kind of environments ?
I will appreciate your feedback and help on this.
Thanks
Kishore Mohapatra
Principal Operations DBA
Seattle, WA
Email : [email protected]<mailto:[email protected]>