On 06/23/11 09:43, David Boxenhorn wrote: > I think very high uptime, and very low data loss is achievable in > Cassandra, but, for new users there are TONS of gotchas. You really > have to know what you're doing, and I doubt that many people acquire > that knowledge without making a lot of mistakes. > > I see above that most people are talking about configuration issues. > But, the first thing that you will probably do, before you have any > experience with Cassandra(!), is architect your system. Architecture > is not easily changed when you bump into a gotcha, and for some reason > you really have to search the literature well to find out about them. > So, my contributions: > > The too many CFs problem. Cassandra doesn't do well with many column > families. If you come from a relational world, a real application can > easily have hundreds of tables. Even if you combine them into entities > (which is the Cassandra way), you can easily end up with dozens of > entities. The most natural thing for someone with a relational > background is have one CF per entity, plus indexes according to your > needs. Don't do it. You need to store multiple entities in the same > CF. Group them together according to access patterns (i.e. when you > use X, you probably also need Y), and distinguish them by adding a > prefix to their keys (e.g. entityName@key).
While avoiding too many CF's is a good idea I would also advise against a very large CF. Keeping a CF size down, helps speed up repair and compact. -- Karl