Hi! > ... > So actually if you use Cassandra – for the application the actual storage > location of the data should not matter. It will be available anywhere in the > cluster if it is stored on any reachable node. I suspected it so, that is Cassandra does not provide a mechanism to strictly constrain what nodes in a cluster hold the data for a specific key space because Cassandra is not designed for that purpose.
Thank you very much for your effort and detailed explanation. > > Von: Andreas Rudolph [mailto:andreas.rudo...@spontech-spine.com] > Gesendet: Dienstag, 10. Januar 2012 15:06 > An: user@cassandra.apache.org > Betreff: Re: AW: How to control location of data? > > Hi! > > Thank you for your last reply. I'm still wondering if I got you right... > > ... > A partitioner decides into which partition a piece of data belongs > Does your statement imply that the partitioner does not take any decisions at > all on the (physical) storage location? Or put another way: What do you mean > with "partition"? > > To quote http://wiki.apache.org/cassandra/ArchitectureInternals: "... > AbstractReplicationStrategy controls what nodes get secondary, tertiary, etc. > replicas of each key range. Primary replica is always determined by the token > ring (...)" > > > ... > You can select different placement strategies and partitioners for different > keyspaces, thereby choosing known data to be stored on known hosts. > This is however discouraged for various reasons – i.e. you need a lot of > knowledge about your data to keep the cluster balanced. What is your usecase > for this requirement? there is probably a more suitable solution. > > What we want is to partition the cluster with respect to key spaces. > That is we want to establish an association between nodes and key spaces so > that a node of the cluster holds data from a key space if and only if that > node is a *member* of that key space. > > To our knowledge Cassandra has no built-in way to specify such a > membership-relation. Therefore we thought of implementing our own replica > placement strategy until we started to assume that the partitioner had to be > replaced, too, to accomplish the task. > > Do you have any ideas? > > > > Von: Andreas Rudolph [mailto:andreas.rudo...@spontech-spine.com] > Gesendet: Dienstag, 10. Januar 2012 09:53 > An: user@cassandra.apache.org > Betreff: How to control location of data? > > Hi! > > We're evaluating Cassandra for our storage needs. One of the key benefits we > see is the online replication of the data, that is an easy way to share data > across nodes. But we have the need to precisely control on what node group > specific parts of a key space (columns/column families) are stored on. Now > we're having trouble understanding the documentation. Could anyone help us > with to find some answers to our questions? > > · What does the term "replica" mean: If a key is stored on exactly three > nodes in a cluster, is it correct then to say that there are three replicas > of that key or are there just two replicas (copies) and one original? > · What is the relation between the Cassandra concepts "Partitioner" and > "Replica Placement Strategy"? According to documentation found on DataStax > web site and architecture internals from the Cassandra Wiki the first storage > location of a key (and its associated data) is determined by the > "Partitioner" whereas additional storage locations are defined by "Replica > Placement Strategy". I'm wondering if I could completely redefine the way how > nodes are selected to store a key by just implementing my own subclass of > AbstractReplicationStrategy and configuring that subclass into the key space. > · How can I suppress that the "Partitioner" is consulted at all to determine > what node stores a key first? > · Is a key space always distributed across the whole cluster? Is it possible > to configure Cassandra in such a way that more or less freely chosen parts of > a key space (columns) are stored on arbitrarily chosen nodes? > > Any tips would be very appreciated :-) > >