Re: 10000+ CF support from Cassandra

Graham Sanderson Thu, 28 May 2015 03:08:07 -0700

Depending on your use case and data types (for example if you can have a 
minimally
Nested Json representation of the objects;
Than you could go with a common map<string,string> representation where keys 
are top love object fields and values are valid Json literals as strings; eg 
unquoted primitives, quoted strings, unquoted arrays or other objects


Each top level field is then independently updatable - which may be beneficial 
(and allows you to trivially keep historical versions of objects of that is a 
requirement)

If you are updating the object in its entirety on save then simply store the 
entire object in a single cql field, and denormalize any search fields you may 
need (which you kinda want to do anyway)

Sent from my iPhone

> On May 28, 2015, at 1:49 AM, Arun Chaitanya <chaitan64a...@gmail.com> wrote:
> 
> Hello Jack,
> 
> > Column families? As opposed to tables? Are you using Thrift instead of 
> > CQL3? You should be focusing on the latter, not the former.
> We have an ORM developed in our company, which maps each DTO to a column 
> family. So, we have many column families. We are using CQL3.
> 
> > But either way, the general guidance is that there is no absolute limit of 
> > tables per se, but "low hundreds" is the recommended limit, regardless of 
> > whether how many key spaces they may be divided 
> > between. More than that is an anti-pattern for Cassandra - maybe you can 
> > make it work for your application, but it isn't recommended.
> You want to say that most cassandra users don't have more than 2-300 column 
> families? Is this achieved through careful data modelling?
> 
> > A successful Cassandra deployment is critically dependent on careful data 
> > modeling - who is responsible for modeling each of these tables, you and a 
> > single, tightly-knit team with very common interests > and very specific 
> > goals and SLAs or many different developers with different managers with 
> > different goals such as SLAs?
> The latter.
> 
> > When you say multi-tenant, are you simply saying that each of your 
> > organization's customers has their data segregated, or does each customer 
> > have direct access to the cluster?
> Each organization's data is in the same cluster. No customer doesn't have 
> access to the cluster.
> 
> Thanks,
> Arun
> 
>> On Wed, May 27, 2015 at 7:17 PM, Jack Krupansky <jack.krupan...@gmail.com> 
>> wrote:
>> Scalability of Cassandra refers primarily to number of rows and number of 
>> nodes - to add more data, add more nodes.
>> 
>> Column families? As opposed to tables? Are you using Thrift instead of CQL3? 
>> You should be focusing on the latter, not the former.
>> 
>> But either way, the general guidance is that there is no absolute limit of 
>> tables per se, but "low hundreds" is the recommended limit, regardless of 
>> whether how many key spaces they may be divided between. More than that is 
>> an anti-pattern for Cassandra - maybe you can make it work for your 
>> application, but it isn't recommended.
>> 
>> A successful Cassandra deployment is critically dependent on careful data 
>> modeling - who is responsible for modeling each of these tables, you and a 
>> single, tightly-knit team with very common interests and very specific goals 
>> and SLAs or many different developers with different managers with different 
>> goals such as SLAs?
>> 
>> When you say multi-tenant, are you simply saying that each of your 
>> organization's customers has their data segregated, or does each customer 
>> have direct access to the cluster?
>> 
>> 
>> 
>> 
>> 
>> -- Jack Krupansky
>> 
>>> On Tue, May 26, 2015 at 11:32 PM, Arun Chaitanya <chaitan64a...@gmail.com> 
>>> wrote:
>>> Good Day Everyone,
>>> 
>>> I am very happy with the (almost) linear scalability offered by C*. We had 
>>> a lot of problems with RDBMS.
>>> 
>>> But, I heard that C* has a limit on number of column families that can be 
>>> created in a single cluster.
>>> The reason being each CF stores 1-2 MB on the JVM heap.
>>> 
>>> In our use case, we have about 10000+ CF and we want to support 
>>> multi-tenancy.
>>> (i.e 10000 * no of tenants)
>>> 
>>> We are new to C* and being from RDBMS background, I would like to 
>>> understand how to tackle this scenario from your advice.
>>> 
>>> Our plan is to use Off-Heap memtable approach.
>>> http://www.datastax.com/dev/blog/off-heap-memtables-in-Cassandra-2-1
>>> 
>>> Each node in the cluster has following configuration
>>> 16 GB machine (8GB Cassandra JVM + 2GB System + 6GB Off-Heap)
>>> IMO, this should be able to support 1000 CF with no(very less) impact on 
>>> performance and startup time.
>>> 
>>> We tackle multi-tenancy using different keyspaces.(Solution I found on the 
>>> web)
>>> 
>>> Using this approach we can have 10 clusters doing the job. (We actually are 
>>> worried about the cost)
>>> 
>>> Can you please help us evaluate this strategy? I want to hear communities 
>>> opinion on this.
>>> 
>>> My major concerns being, 
>>> 
>>> 1. Is Off-Heap strategy safe and my assumption of 16 GB supporting 1000 CF 
>>> right?
>>> 
>>> 2. Can we use multiple keyspaces to solve multi-tenancy? IMO, the number of 
>>> column families increase even when we use multiple keyspace.
>>> 
>>> 3. I understand the complexity using multi-cluster for single application. 
>>> The code base will get tightly coupled with infrastructure. Is this the 
>>> right approach?
>>> 
>>> Any suggestion is appreciated.
>>> 
>>> Thanks,
>>> Arun
>

Re: 10000+ CF support from Cassandra

Reply via email to