This feature interests me, so I thought I'd add some comments. Having used partition features in existing databases like DB2, Oracle and manual partitioning, one of the biggest challenges is keeping the partitions balanced. What I've seen with manual partitioning is that often the partitions get unbalanced. Usually the developers take a best guess and hope it ends up balanced.
Some of the approaches I've used in the past were zip code, area code, state and some kind of hash. So my question related deterministic sharding is this, "what rebalance feature(s) would be useful or needed once the partitions get unbalanced?" Without a decent plan for rebalancing, it often ends up being a very painful problem to solve in production. Back when I worked mobile apps, we saw issues with how OpenWave WAP servers partitioned the accounts. The early versions randomly assigned a phone to a server when it is provisioned the first time. Once the phone was associated to that server, it was stuck on that server. If the load on that server was heavier than the others, the only choice was to "scale up" the hardware. My understanding of Cassandra's current sharding is consistent and random. Does the new feature sit some where in-between? Are you thinking of a pluggable API so that you can provide your own hash algorithm for cassandra to use? On Mon, Nov 7, 2011 at 7:54 AM, Daniel Doubleday <daniel.double...@gmx.net> wrote: > Allow for deterministic / manual sharding of rows. > > Right now it seems that there is no way to force rows with different row keys > will be stored on the same nodes in the ring. > This is our number one reason why we get data inconsistencies when nodes fail. > > Sometimes a logical transaction requires writing rows with different row > keys. If we could use something like this: > > prefix.uniquekey and let the partitioner use only the prefix the probability > that only part of the transaction would be written could be reduced > considerably. > > > > On Nov 1, 2011, at 11:59 PM, Jonathan Ellis wrote: > >> Hi all, >> >> Two years ago I asked for Cassandra use cases and feature requests. >> [1] The results [2] have been extremely useful in setting and >> prioritizing goals for Cassandra development. But with the release of >> 1.0 we've accomplished basically everything from our original wish >> list. [3] >> >> I'd love to hear from modern Cassandra users again, especially if >> you're usually a quiet lurker. What does Cassandra do well? What are >> your pain points? What's your feature wish list? >> >> As before, if you're in stealth mode or don't want to say anything in >> public, feel free to reply to me privately and I will keep it off the >> record. >> >> [1] >> http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html >> [2] >> http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html >> [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com > >