We use Cassandra for multi-tenant application. Each tenant has own set of tables and we have 1592 tables in total in our production Cassandra cluster. It's running well and doesn't have any memory consumption issue, but the challenge confronting us is the schema change problem.We have such a large amount of tables, moreover a significant amount of them have around as many as 50 columns, which totals 33099 rows in schema_columns table in system keyspace. Every time we do schema change for all of our tenants, the whole cluster will get very busy and applications running against it need to be shut down for several hours to accommodate this change.
The way we solve this is using a new feature we developed called "template". There is a detailed description in the JIRA issue we opened: https://issues.apache.org/jira/browse/CASSANDRA-7643 We have some performance results in our 15-node test cluster. Normally creating 400 tables takes more than hours for all the migration stage tasks to complete , but if we create 400 tables with templates, *it just takes 1 to 2 seconds*. It also works great for alter table. [image: Inline image 1] [image: Inline image 1] table # in the graph means the number of existing tables in user keyspaces. We created 400 more tables and measure the time all tasks in migration stage take to complete. Besides, we also measure the migration task completion time for adding one column for a template, which will also add the column for all the column families with that template. We believe what we proposed here can be very useful for other people in the Cassandra community as well. We have attached the patch in the JIRA. You can also read the community feedbacks there. Thanks, Cheng On Tue, Aug 5, 2014 at 5:43 AM, Michal Michalski < michal.michal...@boxever.com> wrote: > >> - Use a keyspace per customer > > These effectively amount to the same thing and they both fall foul to the > > limit in the number of column families so do not scale. > > But then you can scale by moving some of the customers to a new cluster > easily. If you keep everything in a single keyspace or - worse - if you do > your multitenancy by prefixing row keys with customer ids of some kind, it > won't be that easy, as you wrote later in your e-mail. > > M. > > > > Kind regards, > MichaĆ Michalski, > michal.michal...@boxever.com > > > On 5 August 2014 12:36, Phil Luckhurst <phil.luckhu...@powerassure.com> > wrote: > >> Hi Mark, >> >> Mark Reddy wrote >> > To segregate customer data, you could: >> > - Use customer specific column families under a single keyspace >> > - Use a keyspace per customer >> >> These effectively amount to the same thing and they both fall foul to the >> limit in the number of column families so do not scale. >> >> >> Mark Reddy wrote >> > - Use the same column families and have a column that identifies the >> > customer. On the application layer ensure that there are sufficient >> checks >> > so one customer can't read another customers data >> >> And while this gets around the column family limit it does not allow the >> same level of data segregation. For example with a separate keyspace or >> column families it is trivial to remove a single customer's data or move >> that data to another system. With one set of column families for all >> customers these types of actions become much more difficult as any change >> impacts all customers but perhaps that's the price we have to pay to >> scale. >> >> And I still think this needs to be made more prominent in the >> documentation. >> >> Thanks >> Phil >> >> >> >> -- >> View this message in context: >> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Reasonable-range-for-the-max-number-of-tables-tp7596094p7596119.html >> Sent from the cassandra-u...@incubator.apache.org mailing list archive >> at Nabble.com. >> > >