On Mon, Feb 21, 2011 at 5:14 PM, David Boxenhorn <da...@lookin2.com> wrote: > No, that's not what I mean at all. > > That message is about the ability to use different partitioners for > different CFs, say, RandomPartitioner for one, OPP for another. > > I'm talking about defining how many nodes a CF should be distributed over, > which would be useful if you have a lot of nodes and a lot of small CFs > (small relative to the total amount of data). > > > On Mon, Feb 21, 2011 at 9:58 PM, Aaron Morton <aa...@thelastpickle.com> > wrote: >> >> Sounds a bit like this idea >> http://www.mail-archive.com/dev@cassandra.apache.org/msg01799.html >> >> Aaron >> >> On 22/02/2011, at 1:28 AM, David Boxenhorn <da...@lookin2.com> wrote: >> >> > Cassandra is both distributed and replicated. We have Replication Factor >> > but no Distribution Factor! >> > >> > Distribution Factor would define over how many nodes a CF should be >> > distributed. >> > >> > Say you want to support millions of multi-tenant users in clusters with >> > thousands of nodes, where you don't know the user's schema in advance, so >> > you can't have users share CFs. >> > >> > In this case you wouldn't want to spread out each user's Column Families >> > over thousands of nodes! You would want something like: RF=3, DF=10 i.e. >> > distribute each CF over 10 nodes, within those nodes replicate 3 times. >> > >> > One implementation of DF would be to hash the CF name, and use the same >> > strategies defined for RF to choose the N nodes in DF=N. >> > > >
The single partitioner is "baked in" Here is a possible solution. Use OOP, but md5 hash your keys client side. This solves that, but when you have keyspaces using OOP but with different key distributions this falls apart.