The nodes in DC1 need to be able to reach the nodes in DC2 on the public (NAT'd) IP.
Others may be able to provide some more details . Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 27/06/2012, at 9:51 PM, Andras Szerdahelyi wrote: > Aaron, > >> The broadcast_address allows a node to broadcast an address that is >> different to the ones it's bound to on the local interfaces >> https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L270 > > > Yes and thats not where the problem is IMO.. If you broadcast your translated > address ( say 1.2.3.4, a public ip ) , nodes outside your VPN'd network will > have no problems connecting as long as they can route to this address ( which > they should ), but any other nodes on the local net ( e.g. 10.0.1.2 ) won't > be able to connect/route to their neighbor who's telling them to open the > return socket to 1.2.3.4 > > Am i getting this right? At least this is what i have experienced not so long > ago: > > DC1 nodes > a) 10.0.1.1 translated to 1.2.3.4 on NAT > b) 10.0.1.2 translated to 1.2.3.5 on NAT > > DC2 nodes > a) 10.0.2.1 translated to 1.2.4.4 on NAT > b) 10.0.2.2 translated to 1.2.4.5 on NAT > > Let's assume DC2 nodes' broadcast_addresses are their public addresses. > > if, DC1:a and DC1:b broadcast their public address, 1.2.3.4 and 1.2.3.5, they > are advertising an address that is not routable on their network ( loopback ) > but DC2:a and DC2:b can connect/route to them just fine. Nodetool ring on any > DC1 node says the others in DC1 are down, everything else is up . Nodetool > ring on any DC2 node says everything is up. > > if DC1:a and DC1:b broadcast their private address, they can connect to each > other fine, but DC2:a and DC2:b will have no chance to route to them. > Nodetool ring on any DC1 node says everything is up. Nodetool ring on any DC2 > node says DC1 nodes are down. > > regards, > Andras > > > > > On 27 Jun 2012, at 11:29, aaron morton wrote: > >>> Setting up a Cassandra ring across NAT ( without a VPN ) is impossible in >>> my experience. >> The broadcast_address allows a node to broadcast an address that is >> different to the ones it's bound to on the local interfaces >> https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L270 >> >> 1) How can I make sure that the JIRA issue above is my real problem? (I see >> no errors or warns in the logs; no other activity) >>> >>>> >> If the errors are not there it is not your problem. >> >>>> - a full cluster restart allows the first attempted repair to complete >>>> (haven't tested yet; this is not practical even if it works) >> Rolling restart of the nodes involved in the repair is sufficient. >> >> Double checking the networking and check the logs on both sides of the >> transfer for errors or warnings. The code around streaming is better at >> failing loudly now days. >> >> If you dont see anything set DEBUG logging on >> org.apache.cassandra.streaming.FileStreamTask. That will let you know if >> things start and progress. >> >> Hope that helps. >> >> >> ----------------- >> Aaron Morton >> Freelance Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 26/06/2012, at 6:16 PM, Alexandru Sicoe wrote: >> >>> Hi Andras, >>> >>> I am not using a VPN. The system has been running successfully in this >>> configuration for a couple of weeks until I noticed the repair is not >>> working. >>> >>> What happens is that I configure the IP Tables of the machine on each >>> Cassandra node to forward packets that are sent to any of the IPs in the >>> other DC (on ports 7000, 9160 and 7199) to be sent to the gateway IP. The >>> gateway does the NAT sending the packets on the other side to the real >>> destination IP, having replaced the source IP with the initial sender's IP >>> (at least in my understanding of it). >>> >>> What might be the problem given the configuration? How to fix this? >>> >>> Cheers, >>> Alex >>> >>> On Mon, Jun 25, 2012 at 12:47 PM, Andras Szerdahelyi >>> <andras.szerdahe...@ignitionone.com> wrote: >>> >>>> The DCs are communicating over a gateway where I do NAT for ports 7000, >>>> 9160 and 7199. >>> >>> >>> Ah, that sounds familiar. You don't mention if you are VPN'd or not. I'll >>> assume you are not. >>> >>> So, your nodes are behind network address translation - is that to say they >>> advertise ( broadcast ) their internal or translated/forwarded IP to each >>> other? Setting up a Cassandra ring across NAT ( without a VPN ) is >>> impossible in my experience. Either the nodes on your local network won't >>> be able to communicate with each other, because they broadcast their >>> translated ( public ) address which is normally ( router configuration ) >>> not routable from within the local network, or the nodes broadcast their >>> internal IP, in which case the "outside" nodes are helpless in trying to >>> connect to a local net. On DC2 nodes/the node you issue the repair on, >>> check for any sockets being opened to the internal addresses of the nodes >>> in DC1. >>> >>> >>> regards, >>> Andras >>> >>> >>> >>> On 25 Jun 2012, at 11:57, Alexandru Sicoe wrote: >>> >>>> Hello everyone, >>>> >>>> I have a 2 DC (DC1:3 and DC2:6) Cassandra1.0.7 setup. I have about >>>> 300GB/node in the DC2. >>>> >>>> The DCs are communicating over a gateway where I do NAT for ports 7000, >>>> 9160 and 7199. >>>> >>>> I did a "nodetool repair" on a node in DC2 without any external load on >>>> the system. >>>> >>>> It took 5 hrs to finish the Merkle tree calculations (which is fine for >>>> me) but then in the streaming phase nothing happens (0% seen in "nodetool >>>> netstats") and stays like that forever. Note: it has to stream to/from >>>> nodes in DC1! >>>> >>>> I tried another time and still the same. >>>> >>>> Looking around I found this thread >>>> >>>> http://www.mail-archive.com/user@cassandra.apache.org/msg22167.html >>>> which seems to describe the same problem. >>>> >>>> The thread gives 2 suggestions: >>>> - a full cluster restart allows the first attempted repair to complete >>>> (haven't tested yet; this is not practical even if it works) >>>> - issue https://issues.apache.org/jira/browse/CASSANDRA-4223 can be the >>>> problem >>>> >>>> Questions: >>>> 1) How can I make sure that the JIRA issue above is my real problem? (I >>>> see no errors or warns in the logs; no other activity) >>>> 2) What should I do to make the repairs work? (If the JIRA issue is the >>>> problem, then I see there is a fix for it in Version 1.0.11 which is not >>>> released yet) >>>> >>>> Thanks, >>>> Alex >>> >>> >> >