Re: Calculate number of nodes required based on data

Adi Wed, 07 Sep 2011 11:32:03 -0700

On Wed, Sep 7, 2011 at 2:09 PM, Hefeng Yuan <hfy...@rhapsody.com> wrote:


> We didn't change MemtableThroughputInMB/min/maxCompactionThreshold, they're
> 499/4/32.
> As for why we're flushing at ~9m, I guess it has to do with this:
> http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/
> The only parameter I tried to play with is the *
> compaction_throughput_mb_per_sec*, tried cutting it in half and doubled,
> seems none of them helps avoiding the simultaneous compactions on nodes.
>
> I agree that we don't necessarily need to add node, as long as we have a
> way to avoid simultaneous compaction on 4+ nodes.
>
> Thanks,
> Hefeng
>
>
>
Can you check in the logs for something like this
...... Memtable.java (line 157) Writing
Memtable-<ColumnFamilyName>@1151031968(67138588 bytes, 47430 operations)
to see the bytes/operations at which the column family gets flushed. In case
you are hitting the operations threshold you can try increasing that to a
high number. The operations threshold is getting hit at  less than 2% of
size threshold. I would try bumping up the *memtable_operations *substantially.
Default is 1.1624999999999999(in millions).  Try 10 or 20 and see if your CF
flushes at higher size. Keep adjusting it until the frequency/size of
flushing becomes satisfactory and hopefully reduces the compaction overhead.

-Adi







> On Sep 7, 2011, at 10:51 AM, Adi wrote:
>
>
> On Wed, Sep 7, 2011 at 1:09 PM, Hefeng Yuan <hfy...@rhapsody.com> wrote:
>
>> Adi,
>>
>> The reason we're attempting to add more nodes is trying to solve the
>> long/simultaneous compactions, i.e. the performance issue, not the storage
>> issue yet.
>> We have RF 5 and CL QUORUM for read and write, we have currently 6 nodes,
>> and when 4 nodes doing compaction at the same period, we're screwed,
>> especially on read, since it'll cover one of the compaction node anyways.
>> My assumption is that if we add more nodes, each node will have less load,
>> and therefore need less compaction, and probably will compact faster,
>> eternally avoid 4+ nodes doing compaction simultaneously.
>>
>> Any suggestion on how to calculate how many more nodes to add? Or,
>> generally how to plan for number of nodes required, from a performance
>> perspective?
>>
>> Thanks,
>> Hefeng
>>
>>
>>
> Adding nodes to delay and reduce compaction is an interesting performance
> use case :-)  I am thinking you can find a smarter/cheaper way to manage
> that.
> Have you looked at
> a) increasing memtable througput
> What is the nature of your writes?  Is it mostly inserts or also has lot of
> quick updates of recently inserted data. Increasing memtable_throughput can
> delay and maybe reduce the compaction cost if you have lots of updates to
> same data.You will have to provide for memory if you try this.
> When mentioned "with ~9m serialized bytes" is that the memtable
> throughput? That is quite a low threshold which will result in large number
> of SSTables needing to be compacted. I think the default is 256 MB and on
> the lower end values I have seen are 64 MB or maybe 32 MB.
>
>
> b) tweaking min_compaction_threshold and max_compaction_threshold
> - increasing min_compaction_threshold will delay compactions
> - decreasing max_compaction_threshold will reduce number of sstables per
> compaction cycle
> Are you using the defaults 4-32 or are trying some different values
>
> c) splitting column families
> Again splitting column families can also help because compactions occur
> serially one CF at a time and that spreads out your compaction cost over
> time and column families. It requires change in app logic though.
>
> -Adi
>
>
>

Re: Calculate number of nodes required based on data

Reply via email to