> One final question: should I add new nodes as Brisk instances instead of my > home brew cassandra + hadoop nodes? I've obviously already put in the > pain/effort of learning how to run hadoop + cassandra… yes, make you life easier.
> create keyspace civicscience with replication_factor=3 and strategy_options > = [{us-east:3}] and > placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy'; FYI the replication_factor property with the NTS is incorrect, the next(?) revision of 0.8 will raise an error on restart. > I'm wondering if I write my own snitch that extends Ec2Snitch with overrides > as follows: > getDC = if(AZ == c || d) return return us-east (to keep current nodes the > same) else return us-east-hadoop; > getRack = return super(); (returning a,b,c,d seems ok) prob easier to use the PropertyFileSnitch, see the yaml file and the conf/cassandra-topology.properties . You can then manually put the nodes into the DC and Rack you want. > -Is the overall RF=3 still ok? You will need to set the RF for each DC > -Can I (how do I safely) change the keyspace strategy_options from > [{us-east:3}] to [{us-east:2, us-east-hadoop:1}] This seems like the > riskiest/most complicated step of everything I've proposed... http://wiki.apache.org/cassandra/Operations#Replication Hope that helps. ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 25/08/2011, at 6:05 AM, William Oberman wrote: > I was hoping to transition my "simple" cassandra cluster (where each node is > a cassandra + hadoop tasktracker) to a cluster with two virtual datacenters > (vanilla cassandra vs. cassandra + hadoop tasktracker), based on this: > http://wiki.apache.org/cassandra/HadoopSupport#ClusterConfig > The problem I'm having is my hadoop jobs are getting heavy enough it's > affecting my user facing performance on my cluster. > > Right now I'm in AWS, and I have 4 nodes in us-east split over two > availability zones ("us-east-1c" that I'll call "c" and "us-east-1d" that > I'll call "d"), setup with this keyspace: > create keyspace civicscience with replication_factor=3 and strategy_options = > [{us-east:3}] and > placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy'; > And I'm using the Ec2Snitch. > > I'm wondering if I write my own snitch that extends Ec2Snitch with overrides > as follows: > getDC = if(AZ == c || d) return return us-east (to keep current nodes the > same) else return us-east-hadoop; > getRack = return super(); (returning a,b,c,d seems ok) > > Then, if I boot N new nodes into us-east-1[a,b] they will be "hadoop" nodes > because of the snitch. I'll obviously have to change my home brew cassandra > + hadoop instances to selectively run task trackers or not (a/b = yes, and > c/d = no). > > But: > -Is the overall RF=3 still ok? > -What is the recommended split between "normal" and "hadoop" in terms of > strategy_options (assuming RF=3)? 2/1? > -Can I (how do I safely) change the keyspace strategy_options from > [{us-east:3}] to [{us-east:2, us-east-hadoop:1}] This seems like the > riskiest/most complicated step of everything I've proposed... > -After I change the options, what (if anything) would I have to do to migrate > data around? > > One final question: should I add new nodes as Brisk instances instead of my > home brew cassandra + hadoop nodes? I've obviously already put in the > pain/effort of learning how to run hadoop + cassandra... > > Thanks for any help/advice! > > will >