Hi Nick,
My understanding of the tokens in Cassandra is that the key is inserted in
the node which has the closest token [1]. This is very similar to clustering
with the k-means approach, where every vector gets assigned to the cluster
with the closest center. That is not equivalent to the min/max
Tokens are really no different than thresholds. Your token is your min and your
neighbors token is your max. To change your min, you move your token. To change
your max you move your neighbors token.
Your idea of calculating optimal number of keys is similar to the load
balancing idea described
Hi All,
There might be a simpler way to make the OPP achieve even, or close to even
loads.
The small change here is that the OPP has to use thresholds to distribute
keys instead of centers. Every node should have a MIN and a MAX threshold. A
key gets inserted in a node x if MIN_x
MAX_(n-1), then
Hi Jonathan,
I've never seen a paper that discusses it as a primary topic, it is
always in some other context. IIRC, the most recent discussions of it
I have seen have been in join algorithm literature from somewhere in
Asia. MPP analytical databases often implement some form of skew
adaptivity bu
What are some good papers to read for background?
On Tue, Aug 24, 2010 at 12:26 PM, J. Andrew Rogers
wrote:
> On Mon, Aug 23, 2010 at 8:36 PM, Hien. To Trong wrote:
>> OrderPreservingPartitioner is efficient range queries but can cause
>> unevently distributed data. Does anyone has an idea of a
On Mon, Aug 23, 2010 at 8:36 PM, Hien. To Trong wrote:
> OrderPreservingPartitioner is efficient range queries but can cause
> unevently distributed data. Does anyone has an idea of a
> HybridPartitioner which takes advantages of both RandomPartitioner
> and OPP, or at least a partitioner trade of
The trade-off is in choosing which property you need: order preservation or
even load distribution.
The only reason a hybrid partitioner doesn't exist is that no one has been
able to create one. If you can create a partitioner that allows ordering
whilst ensuring an even load distribution, by all
Use OPP and prefix keys with a randomizing element when range queries
will not be required. For keys that will be queried in ranges, don't
use such a prefix.
On Mon, Aug 23, 2010 at 8:36 PM, Hien. To Trong wrote:
> Hi,
> OrderPreservingPartitioner is efficient range queries but can cause unevent
Hi,
OrderPreservingPartitioner is efficient range queries but can cause unevently
distributed data.
Does anyone has an idea of a HybridPartitioner which takes advantages of both
RandomPartitioner and OPP,
or at least a partitioner trade off between them.
https://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/dht/OrderPreservingPartitioner.java
On Sun, Aug 22, 2010 at 10:46 AM, Hien. To Trong wrote:
> Hi,
> I am developing a system with some features like cassandra.
> I want to add order preserving partitioning
Hi,
I am developing a system with some features like cassandra.
I want to add order preserving partitioning strategy, but I don't know how to
implement it.
In cassandra paper - Cassandra - A Decentralized Structured Storage System
"Cassandra partitions data across the cluster using
11 matches
Mail list logo