On Sat, 2010-04-10 at 10:49 +1200, Todd Nine wrote:
> I want the data that is written from the different partitioned
> processing nodes (our c# app servers) to be available in both data
> centres.  I'm assuming I would need an equal number of nodes at each
> data centre, then use the RackAwareStrategy so that data is replicated
> across both locations. Both locations would need the same cluster
> name, is this correct?

Right.

> Is there a way to secure the communication between data centres?
> Given that they will be on different sides of the world, I can't
> guarantee a secure channel between them.

You'll need to use a VPN of some sort between data-centers.

> How is authorization of a new node in a cluster accomplished (if
> possible)? Is it currently done via firewall and cluster IPs, or can
> that be managed in Cassandra internally?

The former. The assumption is made that your nodes are on a trusted
network.

> Is there any sort of management interface for deploying nodes and
> configuring peers?

No, but there is almost nothing to it. With the exception of the
ListenAddress and RPCAddress directives, the configurations will be
identical on all nodes (and there are plenty of existing tools for
copying/syncing configs).

> For ease of administration if I have 10 nodes or more, can I have 2
> peer IP address per node in it's configuration, and deploy the nodes
> in overlapping groups of 3?  I'm assuming once a node connects to
> another, it automatically receives all node information about the
> cluster, is this correct?

Nodes discover one another by means of Gossip, and for that to work all
a new node needs is knowledge of at least one other. This is as simple
as picking a handful of stable nodes in your cluster (2 or 3 is fine)
and configuring them as Seeds.

> Last, are there any tools out there that allow user data mining?
> We'll obviously need to document how our application persists data
> well so that external applications can read the data.  Our sales and
> accounting teams use our current MS SQL system to perform some data
> mining via SQL.  Giving them an interface to allow them to query data
> (in any query language) is a must for our migration. 

Nothing to perform data mining per say; you might have a look at Chiton
(http://github.com/driftx/chiton).

-- 
Eric Evans
eev...@rackspace.com

Reply via email to