Re: Compaction Strategy guidance

Jean-Armel Luce Sun, 23 Nov 2014 23:30:47 -0800

Hi Nikolai,

Please could you clarify a little bit what you call "a large amount of
data" ?


How many tables ?
How many rows in your largest table ?
How many GB in your largest table ?
How many GB per node ?

Thanks.



2014-11-24 8:27 GMT+01:00 Jean-Armel Luce <jaluc...@gmail.com>:

> Hi Nikolai,
>
> Thanks for those informations.
>
> Please could you clarify a little bit what you call "
>
> 2014-11-24 4:37 GMT+01:00 Nikolai Grigoriev <ngrigor...@gmail.com>:
>
>> Just to clarify - when I was talking about the large amount of data I
>> really meant large amount of data per node in a single CF (table). LCS does
>> not seem to like it when it gets thousands of sstables (makes 4-5 levels).
>>
>> When bootstraping a new node you'd better enable that option from
>> CASSANDRA-6621 (the one that disables STCS in L0). But it will still be a
>> mess - I have a node that I have bootstrapped ~2 weeks ago. Initially it
>> had 7,5K pending compactions, now it has almost stabilized ad 4,6K. Does
>> not go down. Number of sstables at L0  is over 11K and it is slowly slowly
>> building upper levels. Total number of sstables is 4x the normal amount.
>> Now I am not entirely sure if this node will ever get back to normal life.
>> And believe me - this is not because of I/O, I have SSDs everywhere and 16
>> physical cores. This machine is barely using 1-3 cores at most of the time.
>> The problem is that allowing STCS fallback is not a good option either - it
>> will quickly result in a few 200Gb+ sstables in my configuration and then
>> these sstables will never be compacted. Plus, it will require close to 2x
>> disk space on EVERY disk in my JBOD configuration...this will kill the node
>> sooner or later. This is all because all sstables after bootstrap end at L0
>> and then the process slowly slowly moves them to other levels. If you have
>> write traffic to that CF then the number of sstables and L0 will grow
>> quickly - like it happens in my case now.
>>
>> Once something like https://issues.apache.org/jira/browse/CASSANDRA-8301
>> is implemented it may be better.
>>
>>
>> On Sun, Nov 23, 2014 at 4:53 AM, Andrei Ivanov <aiva...@iponweb.net>
>> wrote:
>>
>>> Stephane,
>>>
>>> We are having a somewhat similar C* load profile. Hence some comments
>>> in addition Nikolai's answer.
>>> 1. Fallback to STCS - you can disable it actually
>>> 2. Based on our experience, if you have a lot of data per node, LCS
>>> may work just fine. That is, till the moment you decide to join
>>> another node - chances are that the newly added node will not be able
>>> to compact what it gets from old nodes. In your case, if you switch
>>> strategy the same thing may happen. This is all due to limitations
>>> mentioned by Nikolai.
>>>
>>> Andrei,
>>>
>>>
>>> On Sun, Nov 23, 2014 at 8:51 AM, Servando Muñoz G. <smg...@gmail.com>
>>> wrote:
>>> > ABUSE
>>> >
>>> >
>>> >
>>> > YA NO QUIERO MAS MAILS SOY DE MEXICO
>>> >
>>> >
>>> >
>>> > De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com]
>>> > Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m.
>>> > Para: user@cassandra.apache.org
>>> > Asunto: Re: Compaction Strategy guidance
>>> > Importancia: Alta
>>> >
>>> >
>>> >
>>> > Stephane,
>>> >
>>> > As everything good, LCS comes at certain price.
>>> >
>>> > LCS will put most load on you I/O system (if you use spindles - you
>>> may need
>>> > to be careful about that) and on CPU. Also LCS (by default) may fall
>>> back to
>>> > STCS if it is falling behind (which is very possible with heavy writing
>>> > activity) and this will result in higher disk space usage. Also LCS has
>>> > certain limitation I have discovered lately. Sometimes LCS may not be
>>> able
>>> > to use all your node's resources (algorithm limitations) and this
>>> reduces
>>> > the overall compaction throughput. This may happen if you have a large
>>> > column family with lots of data per node. STCS won't have this
>>> limitation.
>>> >
>>> >
>>> >
>>> > By the way, the primary goal of LCS is to reduce the number of
>>> sstables C*
>>> > has to look at to find your data. With LCS properly functioning this
>>> number
>>> > will be most likely between something like 1 and 3 for most of the
>>> reads.
>>> > But if you do few reads and not concerned about the latency today, most
>>> > likely LCS may only save you some disk space.
>>> >
>>> >
>>> >
>>> > On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay <sle...@looplogic.com>
>>> > wrote:
>>> >
>>> > Hi there,
>>> >
>>> >
>>> >
>>> > use case:
>>> >
>>> >
>>> >
>>> > - Heavy write app, few reads.
>>> >
>>> > - Lots of updates of rows / columns.
>>> >
>>> > - Current performance is fine, for both writes and reads..
>>> >
>>> > - Currently using SizedCompactionStrategy
>>> >
>>> >
>>> >
>>> > We're trying to limit the amount of storage used during compaction.
>>> Should
>>> > we switch to LeveledCompactionStrategy?
>>> >
>>> >
>>> >
>>> > Thanks
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> >
>>> > Nikolai Grigoriev
>>> > (514) 772-5178
>>>
>>
>>
>>
>> --
>> Nikolai Grigoriev
>> (514) 772-5178
>>
>
>

Re: Compaction Strategy guidance

Reply via email to