On Wed, Sep 28, 2011 at 4:36 AM, aaron morton <aa...@thelastpickle.com>wrote:
> Thats the one I was thinking of. > > Cheers > > ----------------- > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 28/09/2011, at 9:12 PM, Sylvain Lebresne wrote: > > > https://issues.apache.org/jira/browse/CASSANDRA-295 > > > > -- > > Sylvain > > > > On Wed, Sep 28, 2011 at 10:06 AM, aaron morton <aa...@thelastpickle.com> > wrote: > >> The first thing I can think of is the initial_token for the node must be > a > >> valid token according to the configured partitioner, as the tokens > created > >> by the partitioner are the things stored the distributed hash tree. If > you > >> had a partitioner per KS you would need to configure the initial_token > per > >> KS. > >> Also it's not possible to change *ever* change the partitioner, so it > would > >> have to be excluded from the KS update. > >> They are not show stoppers, just the firs things that come to mind. > >> IIRC a lot of the other access happens in the context of a KS, their may > be > >> other issues but I've not checked the code. > >> Anyone else ? > >> > >> ----------------- > >> Aaron Morton > >> Freelance Cassandra Developer > >> @aaronmorton > >> http://www.thelastpickle.com > >> On 28/09/2011, at 8:28 PM, Philippe wrote: > >> > >> Hi is there any reason why configuring a partitioner per keyspace > wouldn't > >> be possible technically ? > >> > >> Thanks. > >> > > The last time I asked about this I heard it was "really baked in" this led me to plan on this not happening any time soon. If you really need two partitioner my advice is to run two clusters. In some cases multi-tenancy depending on how you use the word is possible, but in other cases it is a pipe dream. The reason I say this is that as you add more CF and KS to a cluster you lower your ability to optimize for a specific keyspace. You inevitably get different workloads and they internally start contending for resources. Also may run into a situation where you need to scale only one CF, but because of constraints of another you end up having to get resources/hardware you do not need. **depending on your work load not a hard fast rule** For example say you have two column families and a 10 node cluster. ColumnFamily A 10GB data/node /500reads/sec ColumnFamily B 500GB data/node /100reads/sec Imagine column family A will need to double read traffic but column family B does not. With one cluster you end up buying 10 nodes with 600GB disk space. With two clusters you could have just extended the capacity of one cluster without the other. You can get this vibe by listening to some of the talks at CassandraSF http://twitter.com/#!/slideshare/status/78906858169057280 In particular twitter had precomputed a matrix of datasize/number servers/ops sec. Rather then have one large cluster that has all your data but tuned for none, have smaller distinct clusters exactly turned for your workload. I am a bit off topic but in general if you are considering two partitioners you almost certainly want 2 distinct clusters. Really NONE of the operations work across keyspace anyway batch_mutate,mget, so a design that spans keyspaces can be unorthodox.