Re: 10000+ CF support from Cassandra

graham sanderson Mon, 01 Jun 2015 20:28:56 -0700

> > I strongly advise against this approach.
> Jon, I think so too. But so you actually foresee any problems with this 
> approach?
> I can think of a few. [I want to evaluate if we can live with this problem]
Just to be clear, I’m not saying this is a great approach, I AM saying that it 
may be better than having 10000+ CFs, which was the original question (it 
really depends on the use case which wasn’t well defined)… map size limit may 
be a problem, and then there is the CQL vs thrift question which could start a 
flame war; ideally CQL maps should give you the same flexibility as arbitrary 
thrift columns


> On Jun 1, 2015, at 9:44 PM, Jonathan Haddad <j...@jonhaddad.com> wrote:
> 
> > Sorry for this naive question but how important is this tuning? Can this 
> > have a huge impact in production?
> 
> Massive.  Here's a graph of when we did some JVM tuning at my previous 
> company: 
> 
> http://33.media.tumblr.com/5d0efca7288dc969c1ac4fc3d36e0151/tumblr_inline_mzvj254quj1rd24f4.png
>  
> <http://33.media.tumblr.com/5d0efca7288dc969c1ac4fc3d36e0151/tumblr_inline_mzvj254quj1rd24f4.png>
> 
> About an order of magnitude difference in performance.
> 
> Jon
> 
> On Mon, Jun 1, 2015 at 7:20 PM Arun Chaitanya <chaitan64a...@gmail.com 
> <mailto:chaitan64a...@gmail.com>> wrote:
> Thanks Jon and Jack,
> 
> > I strongly advise against this approach.
> Jon, I think so too. But so you actually foresee any problems with this 
> approach?
> I can think of a few. [I want to evaluate if we can live with this problem]
> No more CQL. 
> No data types, everything needs to be a blob.
> Limited clustering Keys and default clustering order.
> > First off, different workloads need different tuning.
> Sorry for this naive question but how important is this tuning? Can this have 
> a huge impact in production?
> 
> > You might want to consider a model where you have an application layer that 
> > maps logical tenant tables into partition keys within a single large 
> > Casandra table, or at least a relatively small number of  Cassandra tables. 
> > It will depend on the typical size of your tenant tables - very small ones 
> > would make sense within a single partition, while larger ones should have 
> > separate partitions for a tenant's data. The key here is that tables are 
> > expensive, but partitions are cheap and scale very well with Cassandra.
> We are actually trying similar approach. But we don't want to expose this to 
> application layer. We are attempting to hide this and provide an API.
> 
> > Finally, you said "10 clusters", but did you mean 10 nodes? You might want 
> > to consider a model where you do indeed have multiple clusters, where each 
> > handles a fraction of the tenants, since there is no need for separate 
> > tenants to be on the same cluster.
> I meant 10 clusters. We want to split our tables across multiple clusters if 
> above approach is not possible. [But it seems to be very costly]
> 
> Thanks,
> 
> 
> 
> 
> 
> 
> 
> On Fri, May 29, 2015 at 5:49 AM, Jack Krupansky <jack.krupan...@gmail.com 
> <mailto:jack.krupan...@gmail.com>> wrote:
> How big is each of the tables - are they all fairly small or fairly large? 
> Small as in no more than thousands of rows or large as in tens of millions or 
> hundreds of millions of rows?
> 
> Small tables are are not ideal for a Cassandra cluster since the rows would 
> be spread out across the nodes, even though it might make more sense for each 
> small table to be on a single node.
> 
> You might want to consider a model where you have an application layer that 
> maps logical tenant tables into partition keys within a single large Casandra 
> table, or at least a relatively small number of Cassandra tables. It will 
> depend on the typical size of your tenant tables - very small ones would make 
> sense within a single partition, while larger ones should have separate 
> partitions for a tenant's data. The key here is that tables are expensive, 
> but partitions are cheap and scale very well with Cassandra.
> 
> Finally, you said "10 clusters", but did you mean 10 nodes? You might want to 
> consider a model where you do indeed have multiple clusters, where each 
> handles a fraction of the tenants, since there is no need for separate 
> tenants to be on the same cluster.
> 
> 
> -- Jack Krupansky
> 
> On Tue, May 26, 2015 at 11:32 PM, Arun Chaitanya <chaitan64a...@gmail.com 
> <mailto:chaitan64a...@gmail.com>> wrote:
> Good Day Everyone,
> 
> I am very happy with the (almost) linear scalability offered by C*. We had a 
> lot of problems with RDBMS.
> 
> But, I heard that C* has a limit on number of column families that can be 
> created in a single cluster.
> The reason being each CF stores 1-2 MB on the JVM heap.
> 
> In our use case, we have about 10000+ CF and we want to support multi-tenancy.
> (i.e 10000 * no of tenants)
> 
> We are new to C* and being from RDBMS background, I would like to understand 
> how to tackle this scenario from your advice.
> 
> Our plan is to use Off-Heap memtable approach.
> http://www.datastax.com/dev/blog/off-heap-memtables-in-Cassandra-2-1 
> <http://www.datastax.com/dev/blog/off-heap-memtables-in-Cassandra-2-1>
> 
> Each node in the cluster has following configuration
> 16 GB machine (8GB Cassandra JVM + 2GB System + 6GB Off-Heap)
> IMO, this should be able to support 1000 CF with no(very less) impact on 
> performance and startup time.
> 
> We tackle multi-tenancy using different keyspaces.(Solution I found on the 
> web)
> 
> Using this approach we can have 10 clusters doing the job. (We actually are 
> worried about the cost)
> 
> Can you please help us evaluate this strategy? I want to hear communities 
> opinion on this.
> 
> My major concerns being, 
> 
> 1. Is Off-Heap strategy safe and my assumption of 16 GB supporting 1000 CF 
> right?
> 
> 2. Can we use multiple keyspaces to solve multi-tenancy? IMO, the number of 
> column families increase even when we use multiple keyspace.
> 
> 3. I understand the complexity using multi-cluster for single application. 
> The code base will get tightly coupled with infrastructure. Is this the right 
> approach?
> 
> Any suggestion is appreciated.
> 
> Thanks,
> Arun
> 
>

smime.p7s
Description: S/MIME cryptographic signature

Re: 10000+ CF support from Cassandra

Reply via email to