Thanks, I'm still working the problem so anything I find out I will post here.
Yes, you're right, that is the question I am asking. No, adding more storage is not a solution since new york would have several hundred times more storage. On Apr 14, 2011 6:38 AM, "aaron morton" <aa...@thelastpickle.com> wrote: > I think your question is "NY is the archive, after a certain amount of time we want to delete the row from the original DC but keep it in the archive in NY." > > Once you delete a row, it's deleted as far as the client is concerned. GCGaceSeconds is only concerned with when the tombstone marker can be removed. If NY has a replica of a row from Tokyo and the row is deleted in either DC, it will be deleted in the other DC as well. > > Some thoughts... > 1) Add more storage in the satellite DC's, then tilt you chair to celebrate a job well done :) > 2) Run two clusters as you say. > 3) Just thinking out loud, and I know this does not work now. Would it be possible to support per CF strategy options, so an archive CF only replicates to NY ? Can think of possible problems with repair and LOCAL_QUORUM, out of interest what else would it break? > > Hope that helps. > Aaron > > > > On 14 Apr 2011, at 10:17, Patrick Julien wrote: > >> We have been successful in implementing, at scale, the comments you >> posted here. I'm wondering what we can do about deleting data >> however. >> >> The way I see it, we have considerably more storage capacity in NY, >> but not in the other sites. Using this technique here, it occurs to >> me that we would replicate non-NY deleted rows back to NY. Is there a >> way to tell NY not to tombstone rows? >> >> The ideas I have so far: >> >> - Set GCGracePeriod to be much higher in NY than in the other sites. >> This way we can get to tombstone'd rows well beyond their disk life in >> other sites. >> - A variant on this solution is to set the TTL on rows in non NY sites >> and again, set the GCGracePeriod to be considerably higher in NY >> - break this up to multiple clusters and do one write from the client >> to the its 'local' cluster and one write to the NY cluster. >> >> >> >> On Fri, Apr 8, 2011 at 7:15 PM, Jonathan Ellis <jbel...@gmail.com> wrote: >>> No, I'm suggesting you have a Tokyo keyspace that gets replicated as >>> {Tokyo: 2, NYC:1}, a London keyspace that gets replicated to {London: >>> 2, NYC: 1}, for example. >>> >>> On Fri, Apr 8, 2011 at 5:59 PM, Patrick Julien <pjul...@gmail.com> wrote: >>>> I'm familiar with this material. I hadn't thought of it from this >>>> angle but I believe what you're suggesting is that the different data >>>> centers would hold a different properties file for node discovery >>>> instead of using auto-discovery. >>>> >>>> So Tokyo, and others, would have a configuration that make it >>>> oblivious to the non New York data centers. >>>> New York would have a configuration that would give it knowledge of no >>>> other data center. >>>> >>>> Would that work? Wouldn't the NY data center wonder where these other >>>> writes are coming from? >>>> >>>> On Fri, Apr 8, 2011 at 6:38 PM, Jonathan Ellis <jbel...@gmail.com> wrote: >>>>> On Fri, Apr 8, 2011 at 12:17 PM, Patrick Julien <pjul...@gmail.com> wrote: >>>>>> The problem is this: we would like the historical data from Tokyo to >>>>>> stay in Tokyo and only be replicated to New York. The one in London >>>>>> to be in London and only be replicated to New York and so on for all >>>>>> data centers. >>>>>> >>>>>> Is this currently possible with Cassandra? I believe we would need to >>>>>> run multiple clusters and migrate data manually from data centers to >>>>>> North America to achieve this. Also, any suggestions would also be >>>>>> welcomed. >>>>> >>>>> NetworkTopologyStrategy allows configuration replicas per-keyspace, >>>>> per-datacenter: >>>>> http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers >>>>> >>>>> -- >>>>> Jonathan Ellis >>>>> Project Chair, Apache Cassandra >>>>> co-founder of DataStax, the source for professional Cassandra support >>>>> http://www.datastax.com >>>>> >>>> >>> >>> >>> >>> -- >>> Jonathan Ellis >>> Project Chair, Apache Cassandra >>> co-founder of DataStax, the source for professional Cassandra support >>> http://www.datastax.com >>> >