> But when modeling the application I understand so far that ColumnFamily is > sort of "table with objects". In typical application there are lot of tables > so why is the mindset set towards having more or less 10 ColumnFamilies? > Even in this trivial example there are already 7 CFs > http://www.rackspace.com/cloud/blog/2010/05/12/cassandra-by-example/. > So what is best practice to create applications using Cassandra? Divide > application to more parts and create Keyspace for each one of them?
Keyspaces don't really help. You can have 100 column families if you want, but if you're worried about overhead then whatever overhead you do get will tend to be indirect in nature. Smaller memtables, more files on disk, etc. If you have a legitimate use case for N column families, then that's the way to do it. It is just that the tendency is towards fewer CF:s; for data access together the idea is often to put it into a single CF under the same row key, rather than doing the RDBMS style. Instead of a fully normalized system with foreign keys, you tend to group data together and keep them in fewer column families - often with the same row key across multiple cf:s instead of foreign keys. I guess to re-phrase: In Cassandra the idea is that you model your data after the expected read and write behavior rather than optimizing for normalization. This tends to mean fewer CF:s rather than lots and lots of CF:s, but it does depend on use-case. Sometimes it can mean more CF:s instead of feer (if you split data into different CF:s to separate out often read data from seldom read data, or if you de-normalize to provide multiple materialized views of the same data). So I think the right approach is to look at what the correct data model would be. If that somehow results in an extreme amounts of CF:s, then re-evaluate based on the specific use-case. Maybe that is truly what you need, maybe not. But in any case, the primary concern should not be any potential hard limit but rather the performance implications of how data is stored. If after reaching a conclusion the number of CF:s is high enough that there is a concern that you may hit some kind of artificial limit or unintended side-effect, one can look at the situation then. Suppose there is a hard limit. Suppose there is a piece of code somewhere that says "you can only have up to 100 column families". Even if that were the case, it would be pretty useless to respond to the OP's question with that information. If the use case *truly* calls for 100+ CF:s, then you have to look at the actual effects of that. What was the hard limt, and why is it there? Is it just a matter of removing the hard limit which had no real purpose? The answer to whether or not the limit is truly "hard" in any given situation, is probably going to be at least as dependent on that situation as it is on the code that enforces the limit... (Obviously though it doesn't hurt to know of actual hard artificial limits, and as I said I'm not aware of any.) -- / Peter Schuller