Hello Graham, > Are the CFs different, or all the same schema? The column families are different. May be with better data modelling, we can combine a few of them.
> Are you contractually obligated to actually separate data into separate CFs? No. Its just that we have several sub systems(around 100) and the data is different. > It seems like you’d have a lot simpler time if you could use the part of the partition key to separate data. I didn't understand this approach. You mean, we combine some of them using an extra partition key? But the columns are of different schema, isn't it? Sorry I might be understanding it wrong. > Note also, I don’t know what disks you are using, but disk cache can be pretty helpful, and you haven’t allowed for any in your machine sizing. Of course that depends on your stored data volume also. OK. This is new information, I will consider this. > Also hard to answer your questions without an idea of read/write load system wide, and indeed distribution across tenants. The read write actually depends actually on column family and tenant. Some analytic jobs read data on some CF, some online event jobs write a lot of data. Thanks a lot for your insight. Anyways my previous questions still remain unclear for me. Arun On Wed, May 27, 2015 at 11:40 AM, graham sanderson <gra...@vast.com> wrote: > Are the CFs different, or all the same schema? Are you contractually > obligated to actually separate data into separate CFs? It seems like you’d > have a lot simpler time if you could use the part of the partition key to > separate data. > > Note also, I don’t know what disks you are using, but disk cache can be > pretty helpful, and you haven’t allowed for any in your machine sizing. Of > course that depends on your stored data volume also. > > Also hard to answer your questions without an idea of read/write load > system wide, and indeed distribution across tenants. > > > On May 26, 2015, at 10:32 PM, Arun Chaitanya <chaitan64a...@gmail.com> > wrote: > > Good Day Everyone, > > I am very happy with the (almost) linear scalability offered by C*. We had > a lot of problems with RDBMS. > > But, I heard that C* has a limit on number of column families that can be > created in a single cluster. > The reason being each CF stores 1-2 MB on the JVM heap. > > In our use case, we have about 10000+ CF and we want to support > multi-tenancy. > (i.e 10000 * no of tenants) > > We are new to C* and being from RDBMS background, I would like to > understand how to tackle this scenario from your advice. > > Our plan is to use Off-Heap memtable approach. > http://www.datastax.com/dev/blog/off-heap-memtables-in-Cassandra-2-1 > > Each node in the cluster has following configuration > 16 GB machine (8GB Cassandra JVM + 2GB System + 6GB Off-Heap) > IMO, this should be able to support 1000 CF with no(very less) impact on > performance and startup time. > > We tackle multi-tenancy using different keyspaces.(Solution I found on the > web) > > Using this approach we can have 10 clusters doing the job. (We actually > are worried about the cost) > > Can you please help us evaluate this strategy? I want to hear communities > opinion on this. > > My major concerns being, > > 1. Is Off-Heap strategy safe and my assumption of 16 GB supporting 1000 CF > right? > > 2. Can we use multiple keyspaces to solve multi-tenancy? IMO, the number > of column families increase even when we use multiple keyspace. > > 3. I understand the complexity using multi-cluster for single application. > The code base will get tightly coupled with infrastructure. Is this the > right approach? > > Any suggestion is appreciated. > > Thanks, > Arun > > >