Re: AW: AW: How to control location of data?

Andreas Rudolph Wed, 11 Jan 2012 07:03:53 -0800

Hi!

> ... 
> So actually if you use Cassandra – for the application the actual storage 
> location of the data should not matter. It will be available anywhere in the 
> cluster if it is stored on any reachable node.
I suspected it so, that is Cassandra does not provide a mechanism to strictly 
constrain what nodes in a cluster hold the data for a specific key space 
because Cassandra is not designed for that purpose.


Thank you very much for your effort and detailed explanation.

>  
> Von: Andreas Rudolph [mailto:andreas.rudo...@spontech-spine.com] 
> Gesendet: Dienstag, 10. Januar 2012 15:06
> An: user@cassandra.apache.org
> Betreff: Re: AW: How to control location of data?
>  
> Hi!
>  
> Thank you for your last reply. I'm still wondering if I got you right...
>  
> ... 
> A partitioner decides into which partition a piece of data belongs
> Does your statement imply that the partitioner does not take any decisions at 
> all on the (physical) storage location? Or put another way: What do you mean 
> with "partition"?
>  
> To quote http://wiki.apache.org/cassandra/ArchitectureInternals: "... 
> AbstractReplicationStrategy controls what nodes get secondary, tertiary, etc. 
> replicas of each key range. Primary replica is always determined by the token 
> ring (...)"
> 
> 
> ... 
> You can select different placement strategies and partitioners for different 
> keyspaces, thereby choosing known data to be stored on known hosts.
> This is however discouraged for various reasons – i.e.  you need a lot of 
> knowledge about your data to keep the cluster balanced. What is your usecase 
> for this requirement? there is probably a more suitable solution.
>  
> What we want is to partition the cluster with respect to key spaces.
> That is we want to establish an association between nodes and key spaces so 
> that a node of the cluster holds data from a key space if and only if that 
> node is a *member* of that key space.
>  
> To our knowledge Cassandra has no built-in way to specify such a 
> membership-relation. Therefore we thought of implementing our own replica 
> placement strategy until we started to assume that the partitioner had to be 
> replaced, too, to accomplish the task.
>  
> Do you have any ideas?
>  
> 
> 
> Von: Andreas Rudolph [mailto:andreas.rudo...@spontech-spine.com] 
> Gesendet: Dienstag, 10. Januar 2012 09:53
> An: user@cassandra.apache.org
> Betreff: How to control location of data?
>  
> Hi!
>  
> We're evaluating Cassandra for our storage needs. One of the key benefits we 
> see is the online replication of the data, that is an easy way to share data 
> across nodes. But we have the need to precisely control on what node group 
> specific parts of a key space (columns/column families) are stored on. Now 
> we're having trouble understanding the documentation. Could anyone help us 
> with to find some answers to our questions?
> 
> ·  What does the term "replica" mean: If a key is stored on exactly three 
> nodes in a cluster, is it correct then to say that there are three replicas 
> of that key or are there just two replicas (copies) and one original?
> ·  What is the relation between the Cassandra concepts "Partitioner" and 
> "Replica Placement Strategy"? According to documentation found on DataStax 
> web site and architecture internals from the Cassandra Wiki the first storage 
> location of a key (and its associated data) is determined by the 
> "Partitioner" whereas additional storage locations are defined by "Replica 
> Placement Strategy". I'm wondering if I could completely redefine the way how 
> nodes are selected to store a key by just implementing my own subclass of 
> AbstractReplicationStrategy and configuring that subclass into the key space.
> ·  How can I suppress that the "Partitioner" is consulted at all to determine 
> what node stores a key first?
> ·  Is a key space always distributed across the whole cluster? Is it possible 
> to configure Cassandra in such a way that more or less freely chosen parts of 
> a key space (columns) are stored on arbitrarily chosen nodes?
>  
> Any tips would be very appreciated :-)
>  
>

Re: AW: AW: How to control location of data?

Reply via email to