> One final question: should I add new nodes as Brisk instances instead of my 
> home brew cassandra + hadoop nodes?  I've obviously already put in the 
> pain/effort of learning how to run hadoop + cassandra…
yes, make you life easier.

>  create keyspace civicscience with replication_factor=3 and strategy_options 
> = [{us-east:3}] and 
> placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy';
FYI the replication_factor property with the NTS is incorrect, the next(?) 
revision of 0.8 will raise an error on restart.

> I'm wondering if I write my own snitch that extends Ec2Snitch with overrides 
> as follows:
> getDC = if(AZ == c || d) return return us-east (to keep current nodes the 
> same) else return us-east-hadoop;
> getRack = return super(); (returning a,b,c,d seems ok)
prob easier to use the PropertyFileSnitch, see the yaml file and the 
conf/cassandra-topology.properties . You can then manually put the nodes into 
the DC and Rack you want. 

> -Is the overall RF=3 still ok?
You will need to set the RF for each DC

> -Can I (how do I safely) change the keyspace strategy_options from 
> [{us-east:3}] to [{us-east:2, us-east-hadoop:1}]   This seems like the 
> riskiest/most complicated step of everything I've proposed...
http://wiki.apache.org/cassandra/Operations#Replication

Hope that helps. 

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 25/08/2011, at 6:05 AM, William Oberman wrote:

> I was hoping to transition my "simple" cassandra cluster (where each node is 
> a cassandra + hadoop tasktracker) to a cluster with two virtual datacenters 
> (vanilla cassandra vs. cassandra + hadoop tasktracker), based on this:
> http://wiki.apache.org/cassandra/HadoopSupport#ClusterConfig
> The problem I'm having is my hadoop jobs are getting heavy enough it's 
> affecting my user facing performance on my cluster.
> 
> Right now I'm in AWS, and I have 4 nodes in us-east split over two 
> availability zones ("us-east-1c" that I'll call "c" and "us-east-1d" that 
> I'll call "d"), setup with this keyspace:
> create keyspace civicscience with replication_factor=3 and strategy_options = 
> [{us-east:3}] and 
> placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy';
> And I'm using the Ec2Snitch.
> 
> I'm wondering if I write my own snitch that extends Ec2Snitch with overrides 
> as follows:
> getDC = if(AZ == c || d) return return us-east (to keep current nodes the 
> same) else return us-east-hadoop;
> getRack = return super(); (returning a,b,c,d seems ok)
> 
> Then, if I boot N new nodes into us-east-1[a,b] they will be "hadoop" nodes 
> because of the snitch.  I'll obviously have to change my home brew cassandra 
> + hadoop instances to selectively run task trackers or not (a/b = yes, and 
> c/d = no).
> 
> But:
> -Is the overall RF=3 still ok?
> -What is the recommended split between "normal" and "hadoop" in terms of 
> strategy_options (assuming RF=3)?  2/1?  
> -Can I (how do I safely) change the keyspace strategy_options from 
> [{us-east:3}] to [{us-east:2, us-east-hadoop:1}]   This seems like the 
> riskiest/most complicated step of everything I've proposed...
> -After I change the options, what (if anything) would I have to do to migrate 
> data around?  
> 
> One final question: should I add new nodes as Brisk instances instead of my 
> home brew cassandra + hadoop nodes?  I've obviously already put in the 
> pain/effort of learning how to run hadoop + cassandra...
> 
> Thanks for any help/advice!
> 
> will
> 

Reply via email to