On 2/4/11 11:58 PM, Ertio Lew wrote:
Yes, a disadvantage of more no. of CF in terms of memory utilization
which I see is: -

if some CF is written less often as compared to other CFs, then the
memtable would consume space in the memory until it is flushed, this
memory space could have been much better used by a CF that's heavily
written and read. And if you try to make the thresholds for flush
smaller then more compactions would be needed.


One more disadvantage here is that with CFs that vary widely in the write rate you can also end up with fragmented commit logs which in some cases we have seen actually fill up the commit log partition. As a consequence one thing to consider would be to lower the commit log flush threshold (in minutes) to something lower for the column families that do not see heavy use.



On Sat, Feb 5, 2011 at 11:58 AM, Ertio Lew<ertio...@gmail.com>  wrote:
Thanks Tyler !

I could not fully understand the reason why more no of column families
would mean more memory.. if you have under control parameters like
memtable_throughput&  memtable_operations which are set per column
family basis then you can directly control&  adjust by splitting the
memory space between two CFs in proportion to what you would do in
single CF.
Hence there should be no extra memory consumption for multiple CFs
that have been split from single one??

Regarding the compactions, I think even if they are more the size of
the SST files to be compacted is smaller as the data has been split
into two.
Then more compactions but smaller too!!


Then, provided the same amount of data, how can greater no of column
families could be a bad option(if you split the values of parameters
for memory consumption proportionately) ??

--
Regards,
Ertio





On Sat, Feb 5, 2011 at 10:43 AM, Tyler Hobbs<ty...@datastax.com>  wrote:
I read somewhere that more no of column families is not a good idea as
it consumes more memory and more compactions to occur
This is primarily true, but not in every case.

But the caching requirements may be different as they cater to two
different features.
This is a great reason to *not* merge them.  Besides the key and row caches,
don't forget about the OS buffer cache.

Is it recommended to merge these two column families into one ?? Thoughts
?
No, this sounds like an anti-pattern to me.  The overhead from having two
separate CFs is not that high.

--
Tyler Hobbs
Software Engineer, DataStax
Maintainer of the pycassa Cassandra Python client library


Reply via email to