Hi Nikolai, Please could you clarify a little bit what you call "a large amount of data" ?
How many tables ? How many rows in your largest table ? How many GB in your largest table ? How many GB per node ? Thanks. 2014-11-24 8:27 GMT+01:00 Jean-Armel Luce <jaluc...@gmail.com>: > Hi Nikolai, > > Thanks for those informations. > > Please could you clarify a little bit what you call " > > 2014-11-24 4:37 GMT+01:00 Nikolai Grigoriev <ngrigor...@gmail.com>: > >> Just to clarify - when I was talking about the large amount of data I >> really meant large amount of data per node in a single CF (table). LCS does >> not seem to like it when it gets thousands of sstables (makes 4-5 levels). >> >> When bootstraping a new node you'd better enable that option from >> CASSANDRA-6621 (the one that disables STCS in L0). But it will still be a >> mess - I have a node that I have bootstrapped ~2 weeks ago. Initially it >> had 7,5K pending compactions, now it has almost stabilized ad 4,6K. Does >> not go down. Number of sstables at L0 is over 11K and it is slowly slowly >> building upper levels. Total number of sstables is 4x the normal amount. >> Now I am not entirely sure if this node will ever get back to normal life. >> And believe me - this is not because of I/O, I have SSDs everywhere and 16 >> physical cores. This machine is barely using 1-3 cores at most of the time. >> The problem is that allowing STCS fallback is not a good option either - it >> will quickly result in a few 200Gb+ sstables in my configuration and then >> these sstables will never be compacted. Plus, it will require close to 2x >> disk space on EVERY disk in my JBOD configuration...this will kill the node >> sooner or later. This is all because all sstables after bootstrap end at L0 >> and then the process slowly slowly moves them to other levels. If you have >> write traffic to that CF then the number of sstables and L0 will grow >> quickly - like it happens in my case now. >> >> Once something like https://issues.apache.org/jira/browse/CASSANDRA-8301 >> is implemented it may be better. >> >> >> On Sun, Nov 23, 2014 at 4:53 AM, Andrei Ivanov <aiva...@iponweb.net> >> wrote: >> >>> Stephane, >>> >>> We are having a somewhat similar C* load profile. Hence some comments >>> in addition Nikolai's answer. >>> 1. Fallback to STCS - you can disable it actually >>> 2. Based on our experience, if you have a lot of data per node, LCS >>> may work just fine. That is, till the moment you decide to join >>> another node - chances are that the newly added node will not be able >>> to compact what it gets from old nodes. In your case, if you switch >>> strategy the same thing may happen. This is all due to limitations >>> mentioned by Nikolai. >>> >>> Andrei, >>> >>> >>> On Sun, Nov 23, 2014 at 8:51 AM, Servando Muñoz G. <smg...@gmail.com> >>> wrote: >>> > ABUSE >>> > >>> > >>> > >>> > YA NO QUIERO MAS MAILS SOY DE MEXICO >>> > >>> > >>> > >>> > De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com] >>> > Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m. >>> > Para: user@cassandra.apache.org >>> > Asunto: Re: Compaction Strategy guidance >>> > Importancia: Alta >>> > >>> > >>> > >>> > Stephane, >>> > >>> > As everything good, LCS comes at certain price. >>> > >>> > LCS will put most load on you I/O system (if you use spindles - you >>> may need >>> > to be careful about that) and on CPU. Also LCS (by default) may fall >>> back to >>> > STCS if it is falling behind (which is very possible with heavy writing >>> > activity) and this will result in higher disk space usage. Also LCS has >>> > certain limitation I have discovered lately. Sometimes LCS may not be >>> able >>> > to use all your node's resources (algorithm limitations) and this >>> reduces >>> > the overall compaction throughput. This may happen if you have a large >>> > column family with lots of data per node. STCS won't have this >>> limitation. >>> > >>> > >>> > >>> > By the way, the primary goal of LCS is to reduce the number of >>> sstables C* >>> > has to look at to find your data. With LCS properly functioning this >>> number >>> > will be most likely between something like 1 and 3 for most of the >>> reads. >>> > But if you do few reads and not concerned about the latency today, most >>> > likely LCS may only save you some disk space. >>> > >>> > >>> > >>> > On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay <sle...@looplogic.com> >>> > wrote: >>> > >>> > Hi there, >>> > >>> > >>> > >>> > use case: >>> > >>> > >>> > >>> > - Heavy write app, few reads. >>> > >>> > - Lots of updates of rows / columns. >>> > >>> > - Current performance is fine, for both writes and reads.. >>> > >>> > - Currently using SizedCompactionStrategy >>> > >>> > >>> > >>> > We're trying to limit the amount of storage used during compaction. >>> Should >>> > we switch to LeveledCompactionStrategy? >>> > >>> > >>> > >>> > Thanks >>> > >>> > >>> > >>> > >>> > -- >>> > >>> > Nikolai Grigoriev >>> > (514) 772-5178 >>> >> >> >> >> -- >> Nikolai Grigoriev >> (514) 772-5178 >> > >