> WIll it also use the > snitch/strategy info to find next 'R' replicas 'closest' to > coordinator-node ? yes.
> 2. In a single DC ( with n racks and r replicas ) what algorithm The logic is here https://github.com/apache/cassandra/blob/cassandra-1.1/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java#L78 > a. n>r : I am assuming, have 1 replica in each rack. You have 1 replica in the first n racks. > b. n<r : ?? I am assuming, try to equally distribute replicas across > in each racks. int(n/r) racks will have the same number of replicas. n % r will have more. This is why multi rack replication can be tricky. Hope that helps. ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 12/07/2012, at 8:05 PM, prasenjit mukherjee wrote: > Thanks. Some follow up questions : > > 1. How do the reads use strategy/snitch information ? I am assuming > the reads can go to any of the replicas. WIll it also use the > snitch/strategy info to find next 'R' replicas 'closest' to > coordinator-node ? > > 2. In a single DC ( with n racks and r replicas ) what algorithm > cassandra uses to write its replicas in following scenarios : > a. n>r : I am assuming, have 1 replica in each rack. > b. n<r : ?? I am assuming, try to equally distribute replicas across > in each racks. > > -Thanks, > Prasenjit > > On Thu, Jul 12, 2012 at 11:24 AM, Tyler Hobbs <ty...@datastax.com> wrote: >> I highly recommend specifying the same rack for all nodes (using >> cassandra-topology.properties) unless you really have a good reason not too >> (and you probably don't). The way that replicas are chosen when multiple >> racks are in play can be fairly confusing and lead to a data imbalance if >> you don't catch it. >> >> >> On Wed, Jul 11, 2012 at 10:53 PM, prasenjit mukherjee <prasen....@gmail.com> >> wrote: >>> >>>> As far as I know there isn't any way to use the rack name in the >>>> strategy_options for a keyspace. You >>>> might want to look at the code to dig into that, perhaps. >>> >>> Aha, I was wondering if I could do that as well ( specify rack options ) >>> :) >>> >>> Thanks for the pointer, I will dig into the code. >>> >>> -Thanks, >>> Prasenjit >>> >>> On Thu, Jul 12, 2012 at 5:33 AM, Richard Lowe <richard.l...@arkivum.com> >>> wrote: >>>> If you then specify the parameters for the keyspace to use these, you >>>> can control exactly which set of nodes replicas end up on. >>>> >>>> For example, in cassandra-cli: >>>> >>>> create keyspace ks1 with placement_strategy = >>>> 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options >>>> = { DC1_realtime: 2, DC1_analytics: 1, DC2_realtime: 1 }; >>>> >>>> As far as I know there isn't any way to use the rack name in the >>>> strategy_options for a keyspace. You might want to look at the code to dig >>>> into that, perhaps. >>>> >>>> Whichever snitch you use, the nodes are sorted in order of proximity to >>>> the client node. How this is determined depends on the snitch that's used >>>> but most (the ones that ship with Cassandra) will use the default ordering >>>> of same-node < same-rack < same-datacenter < different-datacenter. Each >>>> snitch has methods to tell Cassandra which rack and DC a node is in, so it >>>> always knows which node is closest. Used with the Bloom filters this can >>>> tell us where the nearest replica is. >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: prasenjit mukherjee [mailto:prasen....@gmail.com] >>>> Sent: 11 July 2012 06:33 >>>> To: user >>>> Subject: How to come up with a predefined topology >>>> >>>> Quoting from >>>> http://www.datastax.com/docs/0.8/cluster_architecture/replication#networktopologystrategy >>>> : >>>> >>>> "Asymmetrical replication groupings are also possible depending on your >>>> use case. For example, you may want to have three replicas per data center >>>> to serve real-time application requests, and then have a single replica in >>>> a >>>> separate data center designated to running analytics." >>>> >>>> Have 2 questions : >>>> 1. Any example how to configure a topology with 3 replicas in one DC ( >>>> with 2 in 1 rack + 1 in another rack ) and one replica in another DC ? >>>> The default networktopologystrategy with rackinferringsnitch will only >>>> give me equal distribution ( 2+2 ) >>>> >>>> 2. I am assuming the reads can go to any of the replicas. Is there a >>>> client which will send query to a node ( in cassandra ring ) which is >>>> closest to the client ? >>>> >>>> -Thanks, >>>> Prasenjit >>>> >>>> >> >> >> >> >> -- >> Tyler Hobbs >> DataStax >>