>> 4. ) Does the larger no of column families has any impact on the >> performance(I read about it somewhere)? Should information for a particular >> row key be split in multiple column families according to the specific query >> demands or should all data related to a particular row key be kept together >> in a single column family ? > > A higher number of column families requires more memory to be used and more > compactions to occur. I can't answer the rest of the question accurately > without more detail on the particular use case.
Though in general I would say that it is worth considering. In particular if you have certain data that is accessed a lot more frequently than other data (especially if the "other data" is large), the improved cache locality of keeping the frequently accessed data separate can be high (assuming greater-than-RAM data sets). Another concern might be if you have some parts that are constantly updated or deleted, while some other part that is mostly append-only. The compaction needs of the frequently overwriting/removed data may be higher, which may also be a reason to separate it out. Whether or not rows should be split in the specific use-case will of course depend, as always. -- / Peter Schuller