That sounds a little complicated. Do you want to get the data out for an off node backup or is it for processing in another system ?
You may get by using: * TTL to expire data via compaction * snapshots for backups Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 3/01/2012, at 11:00 AM, Alexandru Sicoe wrote: > Hi everyone and Happy New Year! > > I need advice for organizing data flow outside of my 3 node Cassandra 0.8.6 > cluster. I am configuring my keyspace to use the NetworkTopologyStrategy. I > have 2 data centers each with a replication factor 1 (i.e. DC1:1; DC2:1) the > configuration of the PropertyFileSnitch is: > > > ip_node1=DC1:RAC1 > > ip_node2=DC2:RAC1 > > ip_node3=DC1:RAC1 > I assign tokens like this: > node1 = 0 > node2 = 1 > node3 = 85070591730234615865843651857942052864 > > My write consistency level is ANY. > > My data sources are only inserting data in node1 & node3. Essentially what > happens is that a replica of every input value will end up on node2. Node 2 > thus has a copy of the entire data written to the cluster. When Node2 starts > getting full, I want to have a script which pulls it off-line and does a > sequence of operations (compaction/snapshotting/exporting/truncating the CFs) > in order to back up the data in a remote place and to free it up so that it > can take more data. When it comes back on-line it will take hints from the > other 2 nodes. > > This is how I plan on shipping data out of my cluster without any downtime or > any major performance penalty. The problem is when I want to also truncate > the CFs in node1 & node3 to also free them up of data. I don't know whether I > can do this without any downtime or without any serious performance > penalties. Is anyone using truncate to free up CFs of data? How efficient is > this? > > Any observations or suggestions are much appreciated! > > Cheers, > Alex