Hi. Thanks a lot for your help. One of the problems I have is that no-one here has clarified how important this data is. I'm working on the assumption that it's 'somewhat important', but not critical data (nothing financial or transactional). So I don't need quorum; in fact as it will be a read/write ratio of easily 1:100000, I think that a consistency level of ANY would be fine.
So if I used OldNetworkTopologyStrategy* (which, if the book is correct, places one replica in the second data centre, and the rest in the first data centre) and up the snitch property file to a replication factor of 2 for production and 1 for offsite, then I should have a production cluster that could cope with the loss of a node, and an offsite node that has all the data on it (eventually). However, as I'm not using quorum, there would be no guarantee that I could recover all the data if a production node went down, or if I had to use/recover from the offsite node. I could even have a second offsite node if that helped things, but adding any more nodes at either site is limited by both cost and space. What really confuses me here is the emphasis on racks. How does the replication placement work when you only have one rack at each data centre? * By the way, I love the name change from something that you could guess from the name (RackAwareStrategy) to something you need to look up each time (OldNetworkTopologyStrategy). Again, thanks for your help. Brian On Wed, 2011-03-30 at 09:00 +1100, aaron morton wrote: > Snapshots take use a hard link and do not take additional disk > space http://www.mail-archive.com/user@cassandra.apache.org/msg11028.html > > > WRT losing a node, it's not the number of total nodes thats important > is the number of replicas. If you have 3 nodes with RF2 and you lose > one of the replicas you will not be able to work at Quorum level. > > > You *may* be able to use the NetworkTopologySnitch to have 2 replicas > in DC1 and 1 replica in DC2. Then use the property file snitch to only > put one node in the second DC. Finally work against DC1 with > LOCAL_QUORUM so you do not wait on DC 2 and you can tolerate the link > to DC2 failing. That also means there is no guarantee DC2 is up to > date. If you were to ship snapshots you would have a better idea of > what you had in DC2. > > > FWIW I'm not convinced that setting things up so that one node gets > *all* the data in DC2 is a good idea. It would make an offsite replica > that could only work at essentially CL ONE and would require a lot of > streaming to move to a cluster with more nodes. I don't have time > right now to think through all of the implications now (may be able to > do some more thinking tonight), but the data stax guide creates a warm > fail over that is ready to work. I'm not sure what this approach would > give you in case of failure: a backup to be restored or a failover > installation. > > > Hope that helps. > Aaron > > > > > On 30 Mar 2011, at 00:38, Brian Lycett wrote: > > > Hi. > > > > Cheers for your reply. > > > > Unfortunately there's too much data for snapshots to be practical. > > The > > data set will be at least 400GB initially, and the offsite node will > > be > > on a 20Mbit leased line. > > > > However I don't need the consistency level to be quorum for > > read/writes > > in the production cluster, so am I right in still assuming that a > > replication factor of 2 in a three node cluster allows for one node > > to > > die without data loss? > > > > If that's the case, I still don't understand how to ensure that the > > offsite node will get a copy of the whole data set. > > I've read through the O'Reilly book, and that doesn't seem to > > address > > this scenario (unless I still don't get the Cassandra basics at a > > fundamental level). > > > > Does anyone know any tutorials/examples of such a set-up that would > > help > > me out? > > > > Cheers, > > > > Brian > > > > > > > > On Tue, 2011-03-29 at 21:56 +1100, aaron morton wrote: > > > Be aware that at RF 2 the Quorum is 2, so you cannot afford to > > > lose a > > > replica when working at Quorum. 3 is really the starting point if > > > you > > > want some redundancy. > > > > > > > > > If you want to get your data offsite how about doing snapshots and > > > moving them off > > > site > > > http://wiki.apache.org/cassandra/Operations#Consistent_backups > > > > > > > > > The guide from Data Stax will give you a warm failover site, which > > > sounds a bit more than what you need. > > > > > > > > > Hope that helps. > > > Aaron > > > > > > > > > On 28 Mar 2011, at 22:47, Brian Lycett wrote: > > > > > > > Hello. > > > > > > > > I'm setting up a cluster that has three nodes in our production > > > > rack. > > > > My intention is to have a replication factor of two for this. > > > > For disaster recovery purposes, I need to have another node (or > > > > two?) > > > > off-site. > > > > > > > > The off-site node is entirely for the purpose of having an > > > > offsite > > > > backup of the data - no clients will connect to it. > > > > > > > > My question is, is it possible to configure Cassandra so that > > > > the > > > > offsite node will have a full copy of the data set? > > > > That is, somehow guarantee that a replica of all data will be > > > > written to > > > > it, but without having to resort to an ALL consistency level for > > > > writes? > > > > Although the offsite node will on a 20Mbit leased line, I'd > > > > rather > > > > not > > > > have the risk that the link goes down and breaks the cluster. > > > > > > > > I've seen this suggestion here: > > > > http://www.datastax.com/docs/0.7/operations/datacenter#disaster > > > > but that configuration is vulnerable to the link breaking, and > > > > uses > > > > four > > > > nodes in the offsite location. > > > > > > > > > > > > Regards, > > > > > > > > Brian > > > > > > > > > > > > > > > > > > > > > > > > > >