Re: Compaction Strategy guidance

Andrei Ivanov Mon, 24 Nov 2014 07:14:13 -0800

OK, let's see - my cluster is recompacting now;-) I will let you know
if this helps


On Mon, Nov 24, 2014 at 5:48 PM, Nikolai Grigoriev <ngrigor...@gmail.com> wrote:
> I was thinking about that option and I would be curious to find out how does
> this change help you. I suspected that increasing sstable size won't help
> too much because the compaction throughput (per task/thread) is still the
> same. So, it will simply take 4x longer to finish a compaction task. It is
> possible that because of that the CPU will be under-used for even longer.
>
> My data model, unfortunately, requires this amount of data. And I suspect
> that regardless of how it is organized I won't be able to optimize it - I do
> need these rows to be in one row so I can read them quickly.
>
> One of the obvious recommendations I have received was to run more than one
> instance of C* per host. Makes sense - it will reduce the amount of data per
> node and will make better use of the resources. I would go for it myself,
> but it may be a challenge for the people in operations. Without a VM this
> would be more tricky for them to operate such a thing and I do not want any
> VMs there.
>
> Another option is to probably simply shard my data between several identical
> tables in the same keyspace. I could also think about different keyspaces
> but I prefer not to spread the data for the same logical "tenant" across
> multiple keyspaces. Use my primary key's hash and then simply do something
> like mod 4 and add this to the table name :) This would effectively reduce
> the number of sstables and amount of data per table (CF). I kind of like
> this idea more - yes, a bit more challenge at coding level but obvious
> benefits without extra operational complexity.
>
>
> On Mon, Nov 24, 2014 at 9:32 AM, Andrei Ivanov <aiva...@iponweb.net> wrote:
>>
>> Nikolai,
>>
>> This is more or less what I'm seeing on my cluster then. Trying to
>> switch to bigger sstables right now (1Gb)
>>
>> On Mon, Nov 24, 2014 at 5:18 PM, Nikolai Grigoriev <ngrigor...@gmail.com>
>> wrote:
>> > Andrei,
>> >
>> > Oh, Monday mornings...Tb :)
>> >
>> > On Mon, Nov 24, 2014 at 9:12 AM, Andrei Ivanov <aiva...@iponweb.net>
>> > wrote:
>> >>
>> >> Nikolai,
>> >>
>> >> Are you sure about 1.26Gb? Like it doesn't look right - 5195 tables
>> >> with 256Mb table size...
>> >>
>> >> Andrei
>> >>
>> >> On Mon, Nov 24, 2014 at 5:09 PM, Nikolai Grigoriev
>> >> <ngrigor...@gmail.com>
>> >> wrote:
>> >> > Jean-Armel,
>> >> >
>> >> > I have only two large tables, the rest is super-small. In the test
>> >> > cluster
>> >> > of 15 nodes the largest table has about 110M rows. Its total size is
>> >> > about
>> >> > 1,26Gb per node (total disk space used per node for that CF). It's
>> >> > got
>> >> > about
>> >> > 5K sstables per node - the sstable size is 256Mb. cfstats on a
>> >> > "healthy"
>> >> > node look like this:
>> >> >
>> >> >     Read Count: 8973748
>> >> >     Read Latency: 16.130059053251774 ms.
>> >> >     Write Count: 32099455
>> >> >     Write Latency: 1.6124713938912671 ms.
>> >> >     Pending Tasks: 0
>> >> >         Table: wm_contacts
>> >> >         SSTable count: 5195
>> >> >         SSTables in each level: [27/4, 11/10, 104/100, 1053/1000,
>> >> > 4000,
>> >> > 0,
>> >> > 0, 0, 0]
>> >> >         Space used (live), bytes: 1266060391852
>> >> >         Space used (total), bytes: 1266144170869
>> >> >         SSTable Compression Ratio: 0.32604853410787327
>> >> >         Number of keys (estimate): 25696000
>> >> >         Memtable cell count: 71402
>> >> >         Memtable data size, bytes: 26938402
>> >> >         Memtable switch count: 9489
>> >> >         Local read count: 8973748
>> >> >         Local read latency: 17.696 ms
>> >> >         Local write count: 32099471
>> >> >         Local write latency: 1.732 ms
>> >> >         Pending tasks: 0
>> >> >         Bloom filter false positives: 32248
>> >> >         Bloom filter false ratio: 0.50685
>> >> >         Bloom filter space used, bytes: 20744432
>> >> >         Compacted partition minimum bytes: 104
>> >> >         Compacted partition maximum bytes: 3379391
>> >> >         Compacted partition mean bytes: 172660
>> >> >         Average live cells per slice (last five minutes): 495.0
>> >> >         Average tombstones per slice (last five minutes): 0.0
>> >> >
>> >> > Another table of similar structure (same number of rows) is about 4x
>> >> > times
>> >> > smaller. That table does not suffer from those issues - it compacts
>> >> > well
>> >> > and
>> >> > efficiently.
>> >> >
>> >> > On Mon, Nov 24, 2014 at 2:30 AM, Jean-Armel Luce <jaluc...@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> Hi Nikolai,
>> >> >>
>> >> >> Please could you clarify a little bit what you call "a large amount
>> >> >> of
>> >> >> data" ?
>> >> >>
>> >> >> How many tables ?
>> >> >> How many rows in your largest table ?
>> >> >> How many GB in your largest table ?
>> >> >> How many GB per node ?
>> >> >>
>> >> >> Thanks.
>> >> >>
>> >> >>
>> >> >>
>> >> >> 2014-11-24 8:27 GMT+01:00 Jean-Armel Luce <jaluc...@gmail.com>:
>> >> >>>
>> >> >>> Hi Nikolai,
>> >> >>>
>> >> >>> Thanks for those informations.
>> >> >>>
>> >> >>> Please could you clarify a little bit what you call "
>> >> >>>
>> >> >>> 2014-11-24 4:37 GMT+01:00 Nikolai Grigoriev <ngrigor...@gmail.com>:
>> >> >>>>
>> >> >>>> Just to clarify - when I was talking about the large amount of
>> >> >>>> data I
>> >> >>>> really meant large amount of data per node in a single CF (table).
>> >> >>>> LCS does
>> >> >>>> not seem to like it when it gets thousands of sstables (makes 4-5
>> >> >>>> levels).
>> >> >>>>
>> >> >>>> When bootstraping a new node you'd better enable that option from
>> >> >>>> CASSANDRA-6621 (the one that disables STCS in L0). But it will
>> >> >>>> still
>> >> >>>> be a
>> >> >>>> mess - I have a node that I have bootstrapped ~2 weeks ago.
>> >> >>>> Initially
>> >> >>>> it had
>> >> >>>> 7,5K pending compactions, now it has almost stabilized ad 4,6K.
>> >> >>>> Does
>> >> >>>> not go
>> >> >>>> down. Number of sstables at L0  is over 11K and it is slowly
>> >> >>>> slowly
>> >> >>>> building
>> >> >>>> upper levels. Total number of sstables is 4x the normal amount.
>> >> >>>> Now I
>> >> >>>> am not
>> >> >>>> entirely sure if this node will ever get back to normal life. And
>> >> >>>> believe me
>> >> >>>> - this is not because of I/O, I have SSDs everywhere and 16
>> >> >>>> physical
>> >> >>>> cores.
>> >> >>>> This machine is barely using 1-3 cores at most of the time. The
>> >> >>>> problem is
>> >> >>>> that allowing STCS fallback is not a good option either - it will
>> >> >>>> quickly
>> >> >>>> result in a few 200Gb+ sstables in my configuration and then these
>> >> >>>> sstables
>> >> >>>> will never be compacted. Plus, it will require close to 2x disk
>> >> >>>> space
>> >> >>>> on
>> >> >>>> EVERY disk in my JBOD configuration...this will kill the node
>> >> >>>> sooner
>> >> >>>> or
>> >> >>>> later. This is all because all sstables after bootstrap end at L0
>> >> >>>> and
>> >> >>>> then
>> >> >>>> the process slowly slowly moves them to other levels. If you have
>> >> >>>> write
>> >> >>>> traffic to that CF then the number of sstables and L0 will grow
>> >> >>>> quickly -
>> >> >>>> like it happens in my case now.
>> >> >>>>
>> >> >>>> Once something like
>> >> >>>> https://issues.apache.org/jira/browse/CASSANDRA-8301
>> >> >>>> is implemented it may be better.
>> >> >>>>
>> >> >>>>
>> >> >>>> On Sun, Nov 23, 2014 at 4:53 AM, Andrei Ivanov
>> >> >>>> <aiva...@iponweb.net>
>> >> >>>> wrote:
>> >> >>>>>
>> >> >>>>> Stephane,
>> >> >>>>>
>> >> >>>>> We are having a somewhat similar C* load profile. Hence some
>> >> >>>>> comments
>> >> >>>>> in addition Nikolai's answer.
>> >> >>>>> 1. Fallback to STCS - you can disable it actually
>> >> >>>>> 2. Based on our experience, if you have a lot of data per node,
>> >> >>>>> LCS
>> >> >>>>> may work just fine. That is, till the moment you decide to join
>> >> >>>>> another node - chances are that the newly added node will not be
>> >> >>>>> able
>> >> >>>>> to compact what it gets from old nodes. In your case, if you
>> >> >>>>> switch
>> >> >>>>> strategy the same thing may happen. This is all due to
>> >> >>>>> limitations
>> >> >>>>> mentioned by Nikolai.
>> >> >>>>>
>> >> >>>>> Andrei,
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> On Sun, Nov 23, 2014 at 8:51 AM, Servando Muñoz G.
>> >> >>>>> <smg...@gmail.com>
>> >> >>>>> wrote:
>> >> >>>>> > ABUSE
>> >> >>>>> >
>> >> >>>>> >
>> >> >>>>> >
>> >> >>>>> > YA NO QUIERO MAS MAILS SOY DE MEXICO
>> >> >>>>> >
>> >> >>>>> >
>> >> >>>>> >
>> >> >>>>> > De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com]
>> >> >>>>> > Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m.
>> >> >>>>> > Para: user@cassandra.apache.org
>> >> >>>>> > Asunto: Re: Compaction Strategy guidance
>> >> >>>>> > Importancia: Alta
>> >> >>>>> >
>> >> >>>>> >
>> >> >>>>> >
>> >> >>>>> > Stephane,
>> >> >>>>> >
>> >> >>>>> > As everything good, LCS comes at certain price.
>> >> >>>>> >
>> >> >>>>> > LCS will put most load on you I/O system (if you use spindles -
>> >> >>>>> > you
>> >> >>>>> > may need
>> >> >>>>> > to be careful about that) and on CPU. Also LCS (by default) may
>> >> >>>>> > fall
>> >> >>>>> > back to
>> >> >>>>> > STCS if it is falling behind (which is very possible with heavy
>> >> >>>>> > writing
>> >> >>>>> > activity) and this will result in higher disk space usage. Also
>> >> >>>>> > LCS
>> >> >>>>> > has
>> >> >>>>> > certain limitation I have discovered lately. Sometimes LCS may
>> >> >>>>> > not
>> >> >>>>> > be
>> >> >>>>> > able
>> >> >>>>> > to use all your node's resources (algorithm limitations) and
>> >> >>>>> > this
>> >> >>>>> > reduces
>> >> >>>>> > the overall compaction throughput. This may happen if you have
>> >> >>>>> > a
>> >> >>>>> > large
>> >> >>>>> > column family with lots of data per node. STCS won't have this
>> >> >>>>> > limitation.
>> >> >>>>> >
>> >> >>>>> >
>> >> >>>>> >
>> >> >>>>> > By the way, the primary goal of LCS is to reduce the number of
>> >> >>>>> > sstables C*
>> >> >>>>> > has to look at to find your data. With LCS properly functioning
>> >> >>>>> > this
>> >> >>>>> > number
>> >> >>>>> > will be most likely between something like 1 and 3 for most of
>> >> >>>>> > the
>> >> >>>>> > reads.
>> >> >>>>> > But if you do few reads and not concerned about the latency
>> >> >>>>> > today,
>> >> >>>>> > most
>> >> >>>>> > likely LCS may only save you some disk space.
>> >> >>>>> >
>> >> >>>>> >
>> >> >>>>> >
>> >> >>>>> > On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay
>> >> >>>>> > <sle...@looplogic.com>
>> >> >>>>> > wrote:
>> >> >>>>> >
>> >> >>>>> > Hi there,
>> >> >>>>> >
>> >> >>>>> >
>> >> >>>>> >
>> >> >>>>> > use case:
>> >> >>>>> >
>> >> >>>>> >
>> >> >>>>> >
>> >> >>>>> > - Heavy write app, few reads.
>> >> >>>>> >
>> >> >>>>> > - Lots of updates of rows / columns.
>> >> >>>>> >
>> >> >>>>> > - Current performance is fine, for both writes and reads..
>> >> >>>>> >
>> >> >>>>> > - Currently using SizedCompactionStrategy
>> >> >>>>> >
>> >> >>>>> >
>> >> >>>>> >
>> >> >>>>> > We're trying to limit the amount of storage used during
>> >> >>>>> > compaction.
>> >> >>>>> > Should
>> >> >>>>> > we switch to LeveledCompactionStrategy?
>> >> >>>>> >
>> >> >>>>> >
>> >> >>>>> >
>> >> >>>>> > Thanks
>> >> >>>>> >
>> >> >>>>> >
>> >> >>>>> >
>> >> >>>>> >
>> >> >>>>> > --
>> >> >>>>> >
>> >> >>>>> > Nikolai Grigoriev
>> >> >>>>>
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> --
>> >> >>>> Nikolai Grigoriev
>> >> >>>>
>> >> >>>
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Nikolai Grigoriev
>> >> >
>> >
>> >
>> >
>> >
>> > --
>> > Nikolai Grigoriev
>> > (514) 772-5178
>
>
>
>
> --
> Nikolai Grigoriev
> (514) 772-5178

Re: Compaction Strategy guidance

Reply via email to