Re: Distribution Factor: part of the solution to many-CF problem?

Edward Capriolo Tue, 22 Feb 2011 09:03:06 -0800

On Mon, Feb 21, 2011 at 5:14 PM, David Boxenhorn <da...@lookin2.com> wrote:
> No, that's not what I mean at all.
>
> That message is about the ability to use different partitioners for
> different CFs, say, RandomPartitioner for one, OPP for another.
>
> I'm talking about defining how many nodes a CF should be distributed over,
> which would be useful if you have a lot of nodes and a lot of small CFs
> (small relative to the total amount of data).
>
>
> On Mon, Feb 21, 2011 at 9:58 PM, Aaron Morton <aa...@thelastpickle.com>
> wrote:
>>
>> Sounds a bit like this idea
>> http://www.mail-archive.com/dev@cassandra.apache.org/msg01799.html
>>
>> Aaron
>>
>> On 22/02/2011, at 1:28 AM, David Boxenhorn <da...@lookin2.com> wrote:
>>
>> > Cassandra is both distributed and replicated. We have Replication Factor
>> > but no Distribution Factor!
>> >
>> > Distribution Factor would define over how many nodes a CF should be
>> > distributed.
>> >
>> > Say you want to support millions of multi-tenant users in clusters with
>> > thousands of nodes, where you don't know the user's schema in advance, so
>> > you can't have users share CFs.
>> >
>> > In this case you wouldn't want to spread out each user's Column Families
>> > over thousands of nodes! You would want something like: RF=3, DF=10 i.e.
>> > distribute each CF over 10 nodes, within those nodes replicate 3 times.
>> >
>> > One implementation of DF would be to hash the CF name, and use the same
>> > strategies defined for RF to choose the N nodes in DF=N.
>> >
>
>


The single partitioner is "baked in"

Here is a possible solution. Use OOP, but md5 hash your keys client side.

This solves that, but when you have keyspaces using OOP but with
different key distributions this falls apart.

Re: Distribution Factor: part of the solution to many-CF problem?

Reply via email to