AW: How to control location of data?

Roland Gude Tue, 10 Jan 2012 03:00:37 -0800

Hi,

i think everything is called a replica so if data is on 3 nodes you have 3 
replicas. There is no such thing as an original.


A partitioner decides into which partition a piece of data belongs
A replica placement strategy decides which partition goes on which node

You cannot suppress the partitioner.

You can select different placement strategies and partitioners for different 
keyspaces, thereby choosing known data to be stored on known hosts.
This is however discouraged for various reasons - i.e.  you need a lot of 
knowledge about your data to keep the cluster balanced. What is your usecase 
for this requirement? there is probably a more suitable solution.

Von: Andreas Rudolph [mailto:andreas.rudo...@spontech-spine.com]
Gesendet: Dienstag, 10. Januar 2012 09:53
An: user@cassandra.apache.org
Betreff: How to control location of data?

Hi!

We're evaluating Cassandra for our storage needs. One of the key benefits we 
see is the online replication of the data, that is an easy way to share data 
across nodes. But we have the need to precisely control on what node group 
specific parts of a key space (columns/column families) are stored on. Now 
we're having trouble understanding the documentation. Could anyone help us with 
to find some answers to our questions?

*  What does the term "replica" mean: If a key is stored on exactly three nodes 
in a cluster, is it correct then to say that there are three replicas of that 
key or are there just two replicas (copies) and one original?

*  What is the relation between the Cassandra concepts "Partitioner" and 
"Replica Placement Strategy"? According to documentation found on DataStax web 
site and architecture internals from the Cassandra Wiki the first storage 
location of a key (and its associated data) is determined by the "Partitioner" 
whereas additional storage locations are defined by "Replica Placement 
Strategy". I'm wondering if I could completely redefine the way how nodes are 
selected to store a key by just implementing my own subclass of 
AbstractReplicationStrategy and configuring that subclass into the key space.

*  How can I suppress that the "Partitioner" is consulted at all to determine 
what node stores a key first?

*  Is a key space always distributed across the whole cluster? Is it possible 
to configure Cassandra in such a way that more or less freely chosen parts of a 
key space (columns) are stored on arbitrarily chosen nodes?

Any tips would be very appreciated :-)

AW: How to control location of data?

Reply via email to