OK, let's see - my cluster is recompacting now;-) I will let you know if this helps
On Mon, Nov 24, 2014 at 5:48 PM, Nikolai Grigoriev <ngrigor...@gmail.com> wrote: > I was thinking about that option and I would be curious to find out how does > this change help you. I suspected that increasing sstable size won't help > too much because the compaction throughput (per task/thread) is still the > same. So, it will simply take 4x longer to finish a compaction task. It is > possible that because of that the CPU will be under-used for even longer. > > My data model, unfortunately, requires this amount of data. And I suspect > that regardless of how it is organized I won't be able to optimize it - I do > need these rows to be in one row so I can read them quickly. > > One of the obvious recommendations I have received was to run more than one > instance of C* per host. Makes sense - it will reduce the amount of data per > node and will make better use of the resources. I would go for it myself, > but it may be a challenge for the people in operations. Without a VM this > would be more tricky for them to operate such a thing and I do not want any > VMs there. > > Another option is to probably simply shard my data between several identical > tables in the same keyspace. I could also think about different keyspaces > but I prefer not to spread the data for the same logical "tenant" across > multiple keyspaces. Use my primary key's hash and then simply do something > like mod 4 and add this to the table name :) This would effectively reduce > the number of sstables and amount of data per table (CF). I kind of like > this idea more - yes, a bit more challenge at coding level but obvious > benefits without extra operational complexity. > > > On Mon, Nov 24, 2014 at 9:32 AM, Andrei Ivanov <aiva...@iponweb.net> wrote: >> >> Nikolai, >> >> This is more or less what I'm seeing on my cluster then. Trying to >> switch to bigger sstables right now (1Gb) >> >> On Mon, Nov 24, 2014 at 5:18 PM, Nikolai Grigoriev <ngrigor...@gmail.com> >> wrote: >> > Andrei, >> > >> > Oh, Monday mornings...Tb :) >> > >> > On Mon, Nov 24, 2014 at 9:12 AM, Andrei Ivanov <aiva...@iponweb.net> >> > wrote: >> >> >> >> Nikolai, >> >> >> >> Are you sure about 1.26Gb? Like it doesn't look right - 5195 tables >> >> with 256Mb table size... >> >> >> >> Andrei >> >> >> >> On Mon, Nov 24, 2014 at 5:09 PM, Nikolai Grigoriev >> >> <ngrigor...@gmail.com> >> >> wrote: >> >> > Jean-Armel, >> >> > >> >> > I have only two large tables, the rest is super-small. In the test >> >> > cluster >> >> > of 15 nodes the largest table has about 110M rows. Its total size is >> >> > about >> >> > 1,26Gb per node (total disk space used per node for that CF). It's >> >> > got >> >> > about >> >> > 5K sstables per node - the sstable size is 256Mb. cfstats on a >> >> > "healthy" >> >> > node look like this: >> >> > >> >> > Read Count: 8973748 >> >> > Read Latency: 16.130059053251774 ms. >> >> > Write Count: 32099455 >> >> > Write Latency: 1.6124713938912671 ms. >> >> > Pending Tasks: 0 >> >> > Table: wm_contacts >> >> > SSTable count: 5195 >> >> > SSTables in each level: [27/4, 11/10, 104/100, 1053/1000, >> >> > 4000, >> >> > 0, >> >> > 0, 0, 0] >> >> > Space used (live), bytes: 1266060391852 >> >> > Space used (total), bytes: 1266144170869 >> >> > SSTable Compression Ratio: 0.32604853410787327 >> >> > Number of keys (estimate): 25696000 >> >> > Memtable cell count: 71402 >> >> > Memtable data size, bytes: 26938402 >> >> > Memtable switch count: 9489 >> >> > Local read count: 8973748 >> >> > Local read latency: 17.696 ms >> >> > Local write count: 32099471 >> >> > Local write latency: 1.732 ms >> >> > Pending tasks: 0 >> >> > Bloom filter false positives: 32248 >> >> > Bloom filter false ratio: 0.50685 >> >> > Bloom filter space used, bytes: 20744432 >> >> > Compacted partition minimum bytes: 104 >> >> > Compacted partition maximum bytes: 3379391 >> >> > Compacted partition mean bytes: 172660 >> >> > Average live cells per slice (last five minutes): 495.0 >> >> > Average tombstones per slice (last five minutes): 0.0 >> >> > >> >> > Another table of similar structure (same number of rows) is about 4x >> >> > times >> >> > smaller. That table does not suffer from those issues - it compacts >> >> > well >> >> > and >> >> > efficiently. >> >> > >> >> > On Mon, Nov 24, 2014 at 2:30 AM, Jean-Armel Luce <jaluc...@gmail.com> >> >> > wrote: >> >> >> >> >> >> Hi Nikolai, >> >> >> >> >> >> Please could you clarify a little bit what you call "a large amount >> >> >> of >> >> >> data" ? >> >> >> >> >> >> How many tables ? >> >> >> How many rows in your largest table ? >> >> >> How many GB in your largest table ? >> >> >> How many GB per node ? >> >> >> >> >> >> Thanks. >> >> >> >> >> >> >> >> >> >> >> >> 2014-11-24 8:27 GMT+01:00 Jean-Armel Luce <jaluc...@gmail.com>: >> >> >>> >> >> >>> Hi Nikolai, >> >> >>> >> >> >>> Thanks for those informations. >> >> >>> >> >> >>> Please could you clarify a little bit what you call " >> >> >>> >> >> >>> 2014-11-24 4:37 GMT+01:00 Nikolai Grigoriev <ngrigor...@gmail.com>: >> >> >>>> >> >> >>>> Just to clarify - when I was talking about the large amount of >> >> >>>> data I >> >> >>>> really meant large amount of data per node in a single CF (table). >> >> >>>> LCS does >> >> >>>> not seem to like it when it gets thousands of sstables (makes 4-5 >> >> >>>> levels). >> >> >>>> >> >> >>>> When bootstraping a new node you'd better enable that option from >> >> >>>> CASSANDRA-6621 (the one that disables STCS in L0). But it will >> >> >>>> still >> >> >>>> be a >> >> >>>> mess - I have a node that I have bootstrapped ~2 weeks ago. >> >> >>>> Initially >> >> >>>> it had >> >> >>>> 7,5K pending compactions, now it has almost stabilized ad 4,6K. >> >> >>>> Does >> >> >>>> not go >> >> >>>> down. Number of sstables at L0 is over 11K and it is slowly >> >> >>>> slowly >> >> >>>> building >> >> >>>> upper levels. Total number of sstables is 4x the normal amount. >> >> >>>> Now I >> >> >>>> am not >> >> >>>> entirely sure if this node will ever get back to normal life. And >> >> >>>> believe me >> >> >>>> - this is not because of I/O, I have SSDs everywhere and 16 >> >> >>>> physical >> >> >>>> cores. >> >> >>>> This machine is barely using 1-3 cores at most of the time. The >> >> >>>> problem is >> >> >>>> that allowing STCS fallback is not a good option either - it will >> >> >>>> quickly >> >> >>>> result in a few 200Gb+ sstables in my configuration and then these >> >> >>>> sstables >> >> >>>> will never be compacted. Plus, it will require close to 2x disk >> >> >>>> space >> >> >>>> on >> >> >>>> EVERY disk in my JBOD configuration...this will kill the node >> >> >>>> sooner >> >> >>>> or >> >> >>>> later. This is all because all sstables after bootstrap end at L0 >> >> >>>> and >> >> >>>> then >> >> >>>> the process slowly slowly moves them to other levels. If you have >> >> >>>> write >> >> >>>> traffic to that CF then the number of sstables and L0 will grow >> >> >>>> quickly - >> >> >>>> like it happens in my case now. >> >> >>>> >> >> >>>> Once something like >> >> >>>> https://issues.apache.org/jira/browse/CASSANDRA-8301 >> >> >>>> is implemented it may be better. >> >> >>>> >> >> >>>> >> >> >>>> On Sun, Nov 23, 2014 at 4:53 AM, Andrei Ivanov >> >> >>>> <aiva...@iponweb.net> >> >> >>>> wrote: >> >> >>>>> >> >> >>>>> Stephane, >> >> >>>>> >> >> >>>>> We are having a somewhat similar C* load profile. Hence some >> >> >>>>> comments >> >> >>>>> in addition Nikolai's answer. >> >> >>>>> 1. Fallback to STCS - you can disable it actually >> >> >>>>> 2. Based on our experience, if you have a lot of data per node, >> >> >>>>> LCS >> >> >>>>> may work just fine. That is, till the moment you decide to join >> >> >>>>> another node - chances are that the newly added node will not be >> >> >>>>> able >> >> >>>>> to compact what it gets from old nodes. In your case, if you >> >> >>>>> switch >> >> >>>>> strategy the same thing may happen. This is all due to >> >> >>>>> limitations >> >> >>>>> mentioned by Nikolai. >> >> >>>>> >> >> >>>>> Andrei, >> >> >>>>> >> >> >>>>> >> >> >>>>> On Sun, Nov 23, 2014 at 8:51 AM, Servando Muñoz G. >> >> >>>>> <smg...@gmail.com> >> >> >>>>> wrote: >> >> >>>>> > ABUSE >> >> >>>>> > >> >> >>>>> > >> >> >>>>> > >> >> >>>>> > YA NO QUIERO MAS MAILS SOY DE MEXICO >> >> >>>>> > >> >> >>>>> > >> >> >>>>> > >> >> >>>>> > De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com] >> >> >>>>> > Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m. >> >> >>>>> > Para: user@cassandra.apache.org >> >> >>>>> > Asunto: Re: Compaction Strategy guidance >> >> >>>>> > Importancia: Alta >> >> >>>>> > >> >> >>>>> > >> >> >>>>> > >> >> >>>>> > Stephane, >> >> >>>>> > >> >> >>>>> > As everything good, LCS comes at certain price. >> >> >>>>> > >> >> >>>>> > LCS will put most load on you I/O system (if you use spindles - >> >> >>>>> > you >> >> >>>>> > may need >> >> >>>>> > to be careful about that) and on CPU. Also LCS (by default) may >> >> >>>>> > fall >> >> >>>>> > back to >> >> >>>>> > STCS if it is falling behind (which is very possible with heavy >> >> >>>>> > writing >> >> >>>>> > activity) and this will result in higher disk space usage. Also >> >> >>>>> > LCS >> >> >>>>> > has >> >> >>>>> > certain limitation I have discovered lately. Sometimes LCS may >> >> >>>>> > not >> >> >>>>> > be >> >> >>>>> > able >> >> >>>>> > to use all your node's resources (algorithm limitations) and >> >> >>>>> > this >> >> >>>>> > reduces >> >> >>>>> > the overall compaction throughput. This may happen if you have >> >> >>>>> > a >> >> >>>>> > large >> >> >>>>> > column family with lots of data per node. STCS won't have this >> >> >>>>> > limitation. >> >> >>>>> > >> >> >>>>> > >> >> >>>>> > >> >> >>>>> > By the way, the primary goal of LCS is to reduce the number of >> >> >>>>> > sstables C* >> >> >>>>> > has to look at to find your data. With LCS properly functioning >> >> >>>>> > this >> >> >>>>> > number >> >> >>>>> > will be most likely between something like 1 and 3 for most of >> >> >>>>> > the >> >> >>>>> > reads. >> >> >>>>> > But if you do few reads and not concerned about the latency >> >> >>>>> > today, >> >> >>>>> > most >> >> >>>>> > likely LCS may only save you some disk space. >> >> >>>>> > >> >> >>>>> > >> >> >>>>> > >> >> >>>>> > On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay >> >> >>>>> > <sle...@looplogic.com> >> >> >>>>> > wrote: >> >> >>>>> > >> >> >>>>> > Hi there, >> >> >>>>> > >> >> >>>>> > >> >> >>>>> > >> >> >>>>> > use case: >> >> >>>>> > >> >> >>>>> > >> >> >>>>> > >> >> >>>>> > - Heavy write app, few reads. >> >> >>>>> > >> >> >>>>> > - Lots of updates of rows / columns. >> >> >>>>> > >> >> >>>>> > - Current performance is fine, for both writes and reads.. >> >> >>>>> > >> >> >>>>> > - Currently using SizedCompactionStrategy >> >> >>>>> > >> >> >>>>> > >> >> >>>>> > >> >> >>>>> > We're trying to limit the amount of storage used during >> >> >>>>> > compaction. >> >> >>>>> > Should >> >> >>>>> > we switch to LeveledCompactionStrategy? >> >> >>>>> > >> >> >>>>> > >> >> >>>>> > >> >> >>>>> > Thanks >> >> >>>>> > >> >> >>>>> > >> >> >>>>> > >> >> >>>>> > >> >> >>>>> > -- >> >> >>>>> > >> >> >>>>> > Nikolai Grigoriev >> >> >>>>> >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> -- >> >> >>>> Nikolai Grigoriev >> >> >>>> >> >> >>> >> >> >> >> >> > >> >> > >> >> > >> >> > -- >> >> > Nikolai Grigoriev >> >> > >> > >> > >> > >> > >> > -- >> > Nikolai Grigoriev >> > (514) 772-5178 > > > > > -- > Nikolai Grigoriev > (514) 772-5178