On Mon, Feb 21, 2011 at 5:14 PM, David Boxenhorn <da...@lookin2.com> wrote:
> No, that's not what I mean at all.
>
> That message is about the ability to use different partitioners for
> different CFs, say, RandomPartitioner for one, OPP for another.
>
> I'm talking about defining how many nodes a CF should be distributed over,
> which would be useful if you have a lot of nodes and a lot of small CFs
> (small relative to the total amount of data).
>
>
> On Mon, Feb 21, 2011 at 9:58 PM, Aaron Morton <aa...@thelastpickle.com>
> wrote:
>>
>> Sounds a bit like this idea
>> http://www.mail-archive.com/dev@cassandra.apache.org/msg01799.html
>>
>> Aaron
>>
>> On 22/02/2011, at 1:28 AM, David Boxenhorn <da...@lookin2.com> wrote:
>>
>> > Cassandra is both distributed and replicated. We have Replication Factor
>> > but no Distribution Factor!
>> >
>> > Distribution Factor would define over how many nodes a CF should be
>> > distributed.
>> >
>> > Say you want to support millions of multi-tenant users in clusters with
>> > thousands of nodes, where you don't know the user's schema in advance, so
>> > you can't have users share CFs.
>> >
>> > In this case you wouldn't want to spread out each user's Column Families
>> > over thousands of nodes! You would want something like: RF=3, DF=10 i.e.
>> > distribute each CF over 10 nodes, within those nodes replicate 3 times.
>> >
>> > One implementation of DF would be to hash the CF name, and use the same
>> > strategies defined for RF to choose the N nodes in DF=N.
>> >
>
>

The single partitioner is "baked in"

Here is a possible solution. Use OOP, but md5 hash your keys client side.

This solves that, but when you have keyspaces using OOP but with
different key distributions this falls apart.

Reply via email to