Hi guys,
 I'm trying out DSE and looking for the best way to arrange the cluster. I
have 9 nodes: 3 behind a gateway taking in writes from my collectors and 6
outside the gateway that are supposed to take replicas from the other 3 and
serve reads and analytics jobs.

1. Is it ok to run the 3 nodes as normal Cassandra nodes and run the other
6 nodes as analytics? Can I serve both real time reads and M/R jobs from
the 6 nodes? How will these affect each other performancewise?

I know that the way the system is supposed to be used is to separate
analytics from real time queries. I've already explored a possible 3DC
setup with Tyler in another message and it indeed works but I'm afraid it
is too complex and would require me to send 2 replicas across the firewall
which it can't handle very well at peak times, affecting other applications.

2. I started the cluster in the setup described in 1 (3 normal, 6
analytics) and as soon as the Analytics nodes start up they start
outputting this message:

INFO [TASK-TRACKER-INIT] 2012-04-03 17:54:59,575 Client.java (line 629)
Retrying connect to server: IP_OF_NORMAL_CASSANDRA_SEED_NODE:8012. Already
tried 10 time(s).
....

So it seems my analytics nodes are trying to contact the normal Cassandra
seed node on port 8012 which I read is a "Hadoop Job Tracker client port".
It doesn't seem like this is the normal behavior. Why is it getting
confused? In the .yaml of each node I'm using endpoint_snitch:
com.datastax.bdp.snitch.DseSimpleSnitch and putting in the Analytics seed
node before the normal cassandra seed node in the seeds.

Cheers,
Alex

Reply via email to