Peter, It sounds what I might want to deploy is a ring-per-datacenter in this case and have each data center replicate to one another (to ensure they all have full copies of the data) but inside of data-center-specific ring, have a handful of nodes that I write to with a CL of QUORUM (or there abouts).
I've not looked at setting up rings to replicate with each other before... is that process pretty well documented/explained or is this a black box that I am slowly wading into? (watching Andrew's talk from Acunu now to get a better idea of this). -R On Mon, Nov 7, 2011 at 10:20 PM, Peter Schuller <peter.schul...@infidyne.com > wrote: > > Thanks for the additional insight on this -- think of a CDN that needs to > > respond to requests, distributed around the globe. Ultimately you would > hope > > that each edge location could respond as quickly as possible (RF=N) but > if > > each of the ring members keep open/active connections to each other, and > a > > request comes in to an edge location that does not contain a copy of the > > data, does it request the data from the node that does, then cache it (in > > the case of more requests coming into that edge location with the same > > request) or does it reply once and forget it, requiring *each* subsequent > > request to that node to always phone back home to the node that actually > > contains it? > > The CDN/edge-server scenario works particularly well to illustrate my > goals, > > if visualizing that helps. > > Look forward to your thoughts. > > Nodes will never cache any data. Nodes have the data that they own > according to the ring topology and the replication factor (to the > extent that the data has been replicated); the node you happen to talk > to is merely a "co-ordinator" of a request; essentially a proxy with > intelligent routing to the correct hosts. > > In the CDN situation, if you're talking about e.g. having a group of > servers in one "place" (network topologically distinct location, such > as geographically distinct) then a better fit than RF=N is probably to > use multi-site support and say that you want a certain number of > copies for each location and have all clients talk to the most local > "site". > > But that's assuming you want to try to model this using just > Cassandra's replication to begin with. Dynamically caching wherever > data is accessed is a good idea for a CDN use-case (probably), but is > not something that Cassandra does itself, internally. It's really > difficult to know what the best solution is for a CDN; and in your > case you imply that it's really *not* a CDN and it's just an analogy > ;) > > -- > / Peter Schuller (@scode, http://worldmodscode.wordpress.com) >