Re: Order preserving partitioning strategy

2010-08-26 Thread Mohamed Ibrahim
Hi Nick, My understanding of the tokens in Cassandra is that the key is inserted in the node which has the closest token [1]. This is very similar to clustering with the k-means approach, where every vector gets assigned to the cluster with the closest center. That is not equivalent to the min/max

Re: Order preserving partitioning strategy

2010-08-26 Thread Nick Bailey
Tokens are really no different than thresholds. Your token is your min and your neighbors token is your max. To change your min, you move your token. To change your max you move your neighbors token. Your idea of calculating optimal number of keys is similar to the load balancing idea described

Re: Order preserving partitioning strategy

2010-08-26 Thread Mohamed Ibrahim
Hi All, There might be a simpler way to make the OPP achieve even, or close to even loads. The small change here is that the OPP has to use thresholds to distribute keys instead of centers. Every node should have a MIN and a MAX threshold. A key gets inserted in a node x if MIN_x MAX_(n-1), then

Re: Order preserving partitioning strategy

2010-08-25 Thread J. Andrew Rogers
Hi Jonathan, I've never seen a paper that discusses it as a primary topic, it is always in some other context. IIRC, the most recent discussions of it I have seen have been in join algorithm literature from somewhere in Asia. MPP analytical databases often implement some form of skew adaptivity bu

Re: Order preserving partitioning strategy

2010-08-24 Thread Jonathan Ellis
What are some good papers to read for background? On Tue, Aug 24, 2010 at 12:26 PM, J. Andrew Rogers wrote: > On Mon, Aug 23, 2010 at 8:36 PM, Hien. To Trong wrote: >> OrderPreservingPartitioner is efficient range queries but can cause >> unevently distributed data. Does anyone has an idea of a

Re: Order preserving partitioning strategy

2010-08-24 Thread J. Andrew Rogers
On Mon, Aug 23, 2010 at 8:36 PM, Hien. To Trong wrote: > OrderPreservingPartitioner is efficient range queries but can cause > unevently distributed data. Does anyone has an idea of a > HybridPartitioner which takes advantages of both RandomPartitioner > and OPP, or at least a partitioner trade of

Re: Order preserving partitioning strategy

2010-08-24 Thread Nick Telford
The trade-off is in choosing which property you need: order preservation or even load distribution. The only reason a hybrid partitioner doesn't exist is that no one has been able to create one. If you can create a partitioner that allows ordering whilst ensuring an even load distribution, by all

Re: Order preserving partitioning strategy

2010-08-23 Thread Benjamin Black
Use OPP and prefix keys with a randomizing element when range queries will not be required. For keys that will be queried in ranges, don't use such a prefix. On Mon, Aug 23, 2010 at 8:36 PM, Hien. To Trong wrote: > Hi, > OrderPreservingPartitioner is efficient range queries but can cause unevent

Re: Order preserving partitioning strategy

2010-08-23 Thread Hien. To Trong
Hi, OrderPreservingPartitioner is efficient range queries but can cause unevently distributed data. Does anyone has an idea of a HybridPartitioner which takes advantages of both RandomPartitioner and OPP, or at least a partitioner trade off between them.

Re: Order preserving partitioning strategy

2010-08-22 Thread Benjamin Black
https://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/dht/OrderPreservingPartitioner.java On Sun, Aug 22, 2010 at 10:46 AM, Hien. To Trong wrote: > Hi, > I am developing a system with some features like cassandra. > I want to add order preserving partitioning

Order preserving partitioning strategy

2010-08-22 Thread Hien. To Trong
Hi, I am developing a system with some features like cassandra. I want to add order preserving partitioning strategy, but I don't know how to implement it. In cassandra paper - Cassandra - A Decentralized Structured Storage System "Cassandra partitions data across the cluster using