On Sun, Feb 26, 2012 at 8:24 PM, aaron morton <aa...@thelastpickle.com>wrote:

> All nodes in the cluster need two way communication. Nodes need to talk to
> Gossip to each other so they know they are alive.
>
> If you need to dump a lot of data consider the Hadoop integration.
> http://wiki.apache.org/cassandra/HadoopSupport It can run a bit faster
> than going through the thrift api.
>

Thanks for the suggestion, I will look into it.


> Copying sstables may be another option depending on the data size.
>

The problem with this is that the SSTable, from what I understand, is per
CF, Since I will want to do a semi real time replication of just the latest
data added this won't work because I will be copying over all the data in
the CF.

Cheers,
A


>
> Cheers
>
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 25/02/2012, at 3:21 AM, Alexandru Sicoe wrote:
>
> Hello everyone,
>
> I'm battling with this contraint that I have: I need to regularly ship out
> timeseries data from a Cassandra cluster that sits within an enclosed
> network, outside of the network.
>
> I tried to select all the data within a certian time window, writing to a
> file, and then copying the file out but this hits the I/O performance
> because even for a small time window (say 5mins) I am hitting more than a
> million rows.
>
> It would really help if I used Cassandra to replicate the data
> automatically outside. The problem is they will only allow me to have
> outbound traffic out of the enclosed network (not inbound). Is there any
> way to configure the cluster or have 2 data centers in such a way that the
> data center (node or cluster) outside of the enclosed network only gets a
> replica of the data, without ever needing to communicate anything back?
>
> I appreciate the help,
> Alex
>
>
>

Reply via email to