If you had one big cache, wouldn't it be the case that it's mostly populated
with frequently accessed rows, and less populated with rarely accessed rows?

In fact, wouldn't one big cache dynamically and automatically give you
exactly what you want? If you try to partition the same amount of memory
manually, by guesswork, among many tables, aren't you always going to do a
worse job?


On Sun, May 1, 2011 at 10:43 PM, Tyler Hobbs <ty...@datastax.com> wrote:

> On Sun, May 1, 2011 at 2:16 PM, Jake Luciani <jak...@gmail.com> wrote:
>
>>
>>
>> On Sun, May 1, 2011 at 2:58 PM, shimi <shim...@gmail.com> wrote:
>>
>>> On Sun, May 1, 2011 at 9:48 PM, Jake Luciani <jak...@gmail.com> wrote:
>>>
>>>> If you have N column families you need N * memtable size of RAM to
>>>> support this.  If that's not an option you can merge them into one as you
>>>> suggest but then you will have much larger SSTables, slower compactions,
>>>> etc.
>>>
>>>
>>>
>>>> I don't necessarily agree with Tyler that the OS cache will be less
>>>> effective... But I do agree that if the sizes of sstables are too large for
>>>> you then more hardware is the solution...
>>>
>>>
>>> If you merge CFs which are hardly accessed with one which are accessed
>>> frequently, when you read the SSTable you load data that is hardly accessed
>>> to the OS cache.
>>>
>>
>>  Only the rows or portions of rows you read will be loaded into the OS
>> cache.  Just because different rows are in the same file doesn't mean the
>> entire file is loaded into the OS cache.  The bloom filter and index file
>> will be loaded but those are not large files.
>>
>
> Right -- it does depend on the page size and the average amount of data
> read.  The effect will be more pronounced on CFs with small rows that those
> with wide rows.
>

Reply via email to