>> 4. ) Does the larger no of column families has any impact on the
>> performance(I read about it somewhere)? Should information for a particular
>> row key be split in multiple column families according to the specific query
>> demands or should all data related to a particular row key be kept together
>> in a single column family ?
>
> A higher number of column families requires more memory to be used and more
> compactions to occur.  I can't answer the rest of the question accurately
> without more detail on the particular use case.

Though in general I would say that it is worth considering. In
particular if you have certain data that is accessed a lot more
frequently than other data (especially if the "other data" is large),
the improved cache locality of keeping the frequently accessed data
separate can be high (assuming greater-than-RAM data sets). Another
concern might be if you have some parts that are constantly updated or
deleted, while some other part that is mostly append-only. The
compaction needs of the frequently overwriting/removed data may be
higher, which may also be a reason to separate it out.

Whether or not rows should be split in the specific use-case will of
course depend, as always.

-- 
/ Peter Schuller

Reply via email to