On Sun, Feb 26, 2012 at 8:24 PM, aaron morton <aa...@thelastpickle.com>wrote:
> All nodes in the cluster need two way communication. Nodes need to talk to > Gossip to each other so they know they are alive. > > If you need to dump a lot of data consider the Hadoop integration. > http://wiki.apache.org/cassandra/HadoopSupport It can run a bit faster > than going through the thrift api. > Thanks for the suggestion, I will look into it. > Copying sstables may be another option depending on the data size. > The problem with this is that the SSTable, from what I understand, is per CF, Since I will want to do a semi real time replication of just the latest data added this won't work because I will be copying over all the data in the CF. Cheers, A > > Cheers > > > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 25/02/2012, at 3:21 AM, Alexandru Sicoe wrote: > > Hello everyone, > > I'm battling with this contraint that I have: I need to regularly ship out > timeseries data from a Cassandra cluster that sits within an enclosed > network, outside of the network. > > I tried to select all the data within a certian time window, writing to a > file, and then copying the file out but this hits the I/O performance > because even for a small time window (say 5mins) I am hitting more than a > million rows. > > It would really help if I used Cassandra to replicate the data > automatically outside. The problem is they will only allow me to have > outbound traffic out of the enclosed network (not inbound). Is there any > way to configure the cluster or have 2 data centers in such a way that the > data center (node or cluster) outside of the enclosed network only gets a > replica of the data, without ever needing to communicate anything back? > > I appreciate the help, > Alex > > >