Depending on your use case and data types (for example if you can have a minimally Nested Json representation of the objects; Than you could go with a common map<string,string> representation where keys are top love object fields and values are valid Json literals as strings; eg unquoted primitives, quoted strings, unquoted arrays or other objects
Each top level field is then independently updatable - which may be beneficial (and allows you to trivially keep historical versions of objects of that is a requirement) If you are updating the object in its entirety on save then simply store the entire object in a single cql field, and denormalize any search fields you may need (which you kinda want to do anyway) Sent from my iPhone > On May 28, 2015, at 1:49 AM, Arun Chaitanya <chaitan64a...@gmail.com> wrote: > > Hello Jack, > > > Column families? As opposed to tables? Are you using Thrift instead of > > CQL3? You should be focusing on the latter, not the former. > We have an ORM developed in our company, which maps each DTO to a column > family. So, we have many column families. We are using CQL3. > > > But either way, the general guidance is that there is no absolute limit of > > tables per se, but "low hundreds" is the recommended limit, regardless of > > whether how many key spaces they may be divided > > between. More than that is an anti-pattern for Cassandra - maybe you can > > make it work for your application, but it isn't recommended. > You want to say that most cassandra users don't have more than 2-300 column > families? Is this achieved through careful data modelling? > > > A successful Cassandra deployment is critically dependent on careful data > > modeling - who is responsible for modeling each of these tables, you and a > > single, tightly-knit team with very common interests > and very specific > > goals and SLAs or many different developers with different managers with > > different goals such as SLAs? > The latter. > > > When you say multi-tenant, are you simply saying that each of your > > organization's customers has their data segregated, or does each customer > > have direct access to the cluster? > Each organization's data is in the same cluster. No customer doesn't have > access to the cluster. > > Thanks, > Arun > >> On Wed, May 27, 2015 at 7:17 PM, Jack Krupansky <jack.krupan...@gmail.com> >> wrote: >> Scalability of Cassandra refers primarily to number of rows and number of >> nodes - to add more data, add more nodes. >> >> Column families? As opposed to tables? Are you using Thrift instead of CQL3? >> You should be focusing on the latter, not the former. >> >> But either way, the general guidance is that there is no absolute limit of >> tables per se, but "low hundreds" is the recommended limit, regardless of >> whether how many key spaces they may be divided between. More than that is >> an anti-pattern for Cassandra - maybe you can make it work for your >> application, but it isn't recommended. >> >> A successful Cassandra deployment is critically dependent on careful data >> modeling - who is responsible for modeling each of these tables, you and a >> single, tightly-knit team with very common interests and very specific goals >> and SLAs or many different developers with different managers with different >> goals such as SLAs? >> >> When you say multi-tenant, are you simply saying that each of your >> organization's customers has their data segregated, or does each customer >> have direct access to the cluster? >> >> >> >> >> >> -- Jack Krupansky >> >>> On Tue, May 26, 2015 at 11:32 PM, Arun Chaitanya <chaitan64a...@gmail.com> >>> wrote: >>> Good Day Everyone, >>> >>> I am very happy with the (almost) linear scalability offered by C*. We had >>> a lot of problems with RDBMS. >>> >>> But, I heard that C* has a limit on number of column families that can be >>> created in a single cluster. >>> The reason being each CF stores 1-2 MB on the JVM heap. >>> >>> In our use case, we have about 10000+ CF and we want to support >>> multi-tenancy. >>> (i.e 10000 * no of tenants) >>> >>> We are new to C* and being from RDBMS background, I would like to >>> understand how to tackle this scenario from your advice. >>> >>> Our plan is to use Off-Heap memtable approach. >>> http://www.datastax.com/dev/blog/off-heap-memtables-in-Cassandra-2-1 >>> >>> Each node in the cluster has following configuration >>> 16 GB machine (8GB Cassandra JVM + 2GB System + 6GB Off-Heap) >>> IMO, this should be able to support 1000 CF with no(very less) impact on >>> performance and startup time. >>> >>> We tackle multi-tenancy using different keyspaces.(Solution I found on the >>> web) >>> >>> Using this approach we can have 10 clusters doing the job. (We actually are >>> worried about the cost) >>> >>> Can you please help us evaluate this strategy? I want to hear communities >>> opinion on this. >>> >>> My major concerns being, >>> >>> 1. Is Off-Heap strategy safe and my assumption of 16 GB supporting 1000 CF >>> right? >>> >>> 2. Can we use multiple keyspaces to solve multi-tenancy? IMO, the number of >>> column families increase even when we use multiple keyspace. >>> >>> 3. I understand the complexity using multi-cluster for single application. >>> The code base will get tightly coupled with infrastructure. Is this the >>> right approach? >>> >>> Any suggestion is appreciated. >>> >>> Thanks, >>> Arun >