You might check out some of the stuff Netflix does with their Cassandra backup, and Cassandra ETL tools.:
http://techblog.netflix.com/2012/02/aegisthus-bulk-data-pipeline-out-of.html
http://techblog.netflix.com/2012/02/announcing-priam.html


-Jeremiah

On 02/29/2012 11:04 AM, Alexandru Sicoe wrote:



On Sun, Feb 26, 2012 at 8:24 PM, aaron morton <aa...@thelastpickle.com <mailto:aa...@thelastpickle.com>> wrote:

    All nodes in the cluster need two way communication. Nodes need to
    talk to Gossip to each other so they know they are alive.

    If you need to dump a lot of data consider the Hadoop integration.
    http://wiki.apache.org/cassandra/HadoopSupport It can run a bit
    faster than going through the thrift api.


Thanks for the suggestion, I will look into it.


    Copying sstables may be another option depending on the data size.


The problem with this is that the SSTable, from what I understand, is per CF, Since I will want to do a semi real time replication of just the latest data added this won't work because I will be copying over all the data in the CF.

Cheers,
A


    Cheers


    -----------------
    Aaron Morton
    Freelance Developer
    @aaronmorton
    http://www.thelastpickle.com

    On 25/02/2012, at 3:21 AM, Alexandru Sicoe wrote:

    Hello everyone,

    I'm battling with this contraint that I have: I need to regularly
    ship out timeseries data from a Cassandra cluster that sits
    within an enclosed network, outside of the network.

    I tried to select all the data within a certian time window,
    writing to a file, and then copying the file out but this hits
    the I/O performance because even for a small time window (say
    5mins) I am hitting more than a million rows.

    It would really help if I used Cassandra to replicate the data
    automatically outside. The problem is they will only allow me to
    have outbound traffic out of the enclosed network (not inbound).
    Is there any way to configure the cluster or have 2 data centers
    in such a way that the data center (node or cluster) outside of
    the enclosed network only gets a replica of the data, without
    ever needing to communicate anything back?

    I appreciate the help,
    Alex


Reply via email to