> > I strongly advise against this approach. > Jon, I think so too. But so you actually foresee any problems with this > approach? > I can think of a few. [I want to evaluate if we can live with this problem] Just to be clear, I’m not saying this is a great approach, I AM saying that it may be better than having 10000+ CFs, which was the original question (it really depends on the use case which wasn’t well defined)… map size limit may be a problem, and then there is the CQL vs thrift question which could start a flame war; ideally CQL maps should give you the same flexibility as arbitrary thrift columns
> On Jun 1, 2015, at 9:44 PM, Jonathan Haddad <j...@jonhaddad.com> wrote: > > > Sorry for this naive question but how important is this tuning? Can this > > have a huge impact in production? > > Massive. Here's a graph of when we did some JVM tuning at my previous > company: > > http://33.media.tumblr.com/5d0efca7288dc969c1ac4fc3d36e0151/tumblr_inline_mzvj254quj1rd24f4.png > > <http://33.media.tumblr.com/5d0efca7288dc969c1ac4fc3d36e0151/tumblr_inline_mzvj254quj1rd24f4.png> > > About an order of magnitude difference in performance. > > Jon > > On Mon, Jun 1, 2015 at 7:20 PM Arun Chaitanya <chaitan64a...@gmail.com > <mailto:chaitan64a...@gmail.com>> wrote: > Thanks Jon and Jack, > > > I strongly advise against this approach. > Jon, I think so too. But so you actually foresee any problems with this > approach? > I can think of a few. [I want to evaluate if we can live with this problem] > No more CQL. > No data types, everything needs to be a blob. > Limited clustering Keys and default clustering order. > > First off, different workloads need different tuning. > Sorry for this naive question but how important is this tuning? Can this have > a huge impact in production? > > > You might want to consider a model where you have an application layer that > > maps logical tenant tables into partition keys within a single large > > Casandra table, or at least a relatively small number of Cassandra tables. > > It will depend on the typical size of your tenant tables - very small ones > > would make sense within a single partition, while larger ones should have > > separate partitions for a tenant's data. The key here is that tables are > > expensive, but partitions are cheap and scale very well with Cassandra. > We are actually trying similar approach. But we don't want to expose this to > application layer. We are attempting to hide this and provide an API. > > > Finally, you said "10 clusters", but did you mean 10 nodes? You might want > > to consider a model where you do indeed have multiple clusters, where each > > handles a fraction of the tenants, since there is no need for separate > > tenants to be on the same cluster. > I meant 10 clusters. We want to split our tables across multiple clusters if > above approach is not possible. [But it seems to be very costly] > > Thanks, > > > > > > > > On Fri, May 29, 2015 at 5:49 AM, Jack Krupansky <jack.krupan...@gmail.com > <mailto:jack.krupan...@gmail.com>> wrote: > How big is each of the tables - are they all fairly small or fairly large? > Small as in no more than thousands of rows or large as in tens of millions or > hundreds of millions of rows? > > Small tables are are not ideal for a Cassandra cluster since the rows would > be spread out across the nodes, even though it might make more sense for each > small table to be on a single node. > > You might want to consider a model where you have an application layer that > maps logical tenant tables into partition keys within a single large Casandra > table, or at least a relatively small number of Cassandra tables. It will > depend on the typical size of your tenant tables - very small ones would make > sense within a single partition, while larger ones should have separate > partitions for a tenant's data. The key here is that tables are expensive, > but partitions are cheap and scale very well with Cassandra. > > Finally, you said "10 clusters", but did you mean 10 nodes? You might want to > consider a model where you do indeed have multiple clusters, where each > handles a fraction of the tenants, since there is no need for separate > tenants to be on the same cluster. > > > -- Jack Krupansky > > On Tue, May 26, 2015 at 11:32 PM, Arun Chaitanya <chaitan64a...@gmail.com > <mailto:chaitan64a...@gmail.com>> wrote: > Good Day Everyone, > > I am very happy with the (almost) linear scalability offered by C*. We had a > lot of problems with RDBMS. > > But, I heard that C* has a limit on number of column families that can be > created in a single cluster. > The reason being each CF stores 1-2 MB on the JVM heap. > > In our use case, we have about 10000+ CF and we want to support multi-tenancy. > (i.e 10000 * no of tenants) > > We are new to C* and being from RDBMS background, I would like to understand > how to tackle this scenario from your advice. > > Our plan is to use Off-Heap memtable approach. > http://www.datastax.com/dev/blog/off-heap-memtables-in-Cassandra-2-1 > <http://www.datastax.com/dev/blog/off-heap-memtables-in-Cassandra-2-1> > > Each node in the cluster has following configuration > 16 GB machine (8GB Cassandra JVM + 2GB System + 6GB Off-Heap) > IMO, this should be able to support 1000 CF with no(very less) impact on > performance and startup time. > > We tackle multi-tenancy using different keyspaces.(Solution I found on the > web) > > Using this approach we can have 10 clusters doing the job. (We actually are > worried about the cost) > > Can you please help us evaluate this strategy? I want to hear communities > opinion on this. > > My major concerns being, > > 1. Is Off-Heap strategy safe and my assumption of 16 GB supporting 1000 CF > right? > > 2. Can we use multiple keyspaces to solve multi-tenancy? IMO, the number of > column families increase even when we use multiple keyspace. > > 3. I understand the complexity using multi-cluster for single application. > The code base will get tightly coupled with infrastructure. Is this the right > approach? > > Any suggestion is appreciated. > > Thanks, > Arun > >
smime.p7s
Description: S/MIME cryptographic signature