That sounds a little complicated. 

Do you want to get the data out for an off node backup or is it for processing 
in another system ? 

You may get by using:

* TTL to expire data via compaction
* snapshots for backups

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 3/01/2012, at 11:00 AM, Alexandru Sicoe wrote:

> Hi everyone and Happy New Year!
> 
> I need advice for organizing data flow outside of my 3 node Cassandra 0.8.6 
> cluster. I am configuring my keyspace to use the NetworkTopologyStrategy. I 
> have 2 data centers each with a replication factor 1 (i.e. DC1:1; DC2:1) the 
> configuration of the PropertyFileSnitch is:
>                               
>                                                                    
> ip_node1=DC1:RAC1
>                                                                               
>                    ip_node2=DC2:RAC1
>                                                                               
>                    ip_node3=DC1:RAC1
> I assign tokens like this:
>                         node1 = 0
>                         node2 = 1
>                         node3 = 85070591730234615865843651857942052864
> 
> My write consistency level is ANY.
> 
> My data sources are only inserting data in node1 & node3. Essentially what 
> happens is that a replica of every input value will end up on node2. Node 2 
> thus has a copy of the entire data written to the cluster. When Node2 starts 
> getting full, I want to have a script which pulls it off-line and does a 
> sequence of operations (compaction/snapshotting/exporting/truncating the CFs) 
> in order to back up the data in a remote place and to free it up so that it 
> can take more data. When it comes back on-line it will take hints from the 
> other 2 nodes.
> 
> This is how I plan on shipping data out of my cluster without any downtime or 
> any major performance penalty. The problem is when I want to also truncate 
> the CFs in node1 & node3 to also free them up of data. I don't know whether I 
> can do this without any downtime or without any serious performance 
> penalties. Is anyone using truncate to free up CFs of data? How efficient is 
> this?
> 
> Any observations or suggestions are much appreciated!
> 
> Cheers,
> Alex

Reply via email to