Yes, the way I see it - and it becomes even more necessary for a multi-tenant configuration - there should be completely separate configurations for applications and for servers.
- Application configuration is based on data and usage characteristics of your application. - Server configuration is based on the specific hardware limitations of the server. Obviously, server limitations take priority over application configuration. Assuming that each tenant in a multi-tenant environment gets one keyspace, you would also want to enforce limitations based on keyspace (which correspond to parameters that the tenant payed for). So now we have three levels: 1. Server configuration (top priority) 2. Keyspace configuration (payed-for service - second priority) 3. Column family configuration (configuration provided by tenant - third priority) On Wed, Jan 19, 2011 at 3:15 PM, indika kumara <indika.k...@gmail.com>wrote: > As the actual problem is mostly related to the number of CFs in the system > (may be number of the columns), I still believe that supporting exposing the > Cassandra ‘as-is’ to a tenant is doable and suitable though need some > fixes. That multi-tenancy model allows a tenant to use the programming > model of the Cassandra ‘as-is’, enabling the seamless migration of an > application that uses the Cassandra into the cloud. Moreover, In order to > support different SLA requirements of different tenants, the configurability > of keyspaces, cfs, etc., per tenant may be critical. However, there are > trade-offs among usability, memory consumption, and performance. I believe > that it is important to consider the SLA requirements of different tenants > when deciding the strategies for controlling resource consumption. > > I like to the idea of system-wide parameters for controlling resource > usage. I believe that the tenant-specific parameters are equally important. > There are resources, and each tenant can claim a portion of them based on > SLA. For instance, if there is a threshold on the number of columns per a > node, it should be able to decide how many columns a particular tenant can > have. It allows selecting a suitable Cassandra cluster for a tenant based > on his or her SLA. I believe the capability to configure resource > controlling parameters per keyspace would be important to support a keyspace > per tenant model. Furthermore, In order to maximize the resource sharing > among tenants, a threshold (on a resource) per keyspace should not be a hard > limit. Rather, it should be oscillated between a hard minimum and a maximum. > For example, if a particular tenant needs more resources at a given time, he > or she should be possible to borrow from the others up to the maximum. The > threshold is only considered when a tenant is assigned to a cluster - the > remaining resources of a cluster should be equal or higher than the resource > limit of the tenant. It may need to spread a single keyspace across multiple > clusters; especially when there are no enough resources in a single > cluster. > > I believe that it would be better to have a flexibility to change > seamlessly multi-tenancy implementation models such as the Cassadra ‘as-is’, > the keyspace per tenant model, a keyspace for all tenants, and so on. Based > on what I have learnt, each model requires adding tenant id (name space) to > a keyspace’s name or cf’s name or raw key, or column’s name. Would it be > better to have a kind of pluggable handler that can access those resources > prior to doing the actual operation so that the required changes can be > done? May be prior to authorization. > > Thanks, > > Indika >