Garo, When your loading data in using LCS in a bulk fashion like this, there are a few things you should do.
You can disable STCS in L0 (https://issues.apache.org/jira/browse/CASSANDRA-6621 <https://issues.apache.org/jira/browse/CASSANDRA-6621>) with the JVM flag "-Dcassandra.disable_stcs_in_l0=true” which should stop you getting huge sstables in L0 while LCS is catching up. Once the load is complete you can then shutdown the node and perform an sstableofflinerelevel (https://docs.datastax.com/en/cassandra/2.2/cassandra/tools/toolsSSTableOfflineRelevel.html <https://docs.datastax.com/en/cassandra/2.2/cassandra/tools/toolsSSTableOfflineRelevel.html>) . This should help LCS catch up with things and reduce the pending compactions etc. however, it may just take a while to catchup still Hope this helps. Johnny > On 6 Jul 2016, at 07:56, Juho Mäkinen <juho.maki...@gmail.com> wrote: > > Hello. I'm in the process of migrating my old 60 node cluster into a new 72 > node cluster running 2.2.6. I fired BulkLoader on the old cluster to stream > all data from every node in the old cluster to my new cluster, and I'm now > watching as my new cluster is doing compactions. What I like is to understand > the LeveledCompactionStrategy behaviour in more detail. > > I'm taking one node as an example, but all other nodes have quite same > situation. > > There are 53 live SSTables in a big table. This can be seen both by looking > la-*Data.db files and also with nodetool cfstats: "SSTables in each level: > [31/4, 10, 12, 0, 0, 0, 0, 0, 0]" > > If I look on the SSTable files in the disk I see some huge SSTables, like a > 37 GiB, 57 GiB, 74 GiB, which are all on Level 0 (used sstablemetadata to see > this). The size of all live sstables are about 920 GiB. > > Then there are tmp-la-*Data.db and tmplink-la-*Data.db files (the tmplink > files are hardlinks to the tmp file due to CASSANDRA-6916). I guess that > these come from the single active compaction. The total size of these files > are around ~65 GiB. > > On the compaction side compactionstats shows that there's just one compaction > running, which is heavily CPU bound: (I've reformatted the output here) > pending tasks: 5390 > bytes done: 673623792733 (673 GiB) > bytes left: 3325656896682 (3325 GiB) > Active compaction remaining time : 2h44m39s > > Why is the bytes done and especially bytes left such big? I don't have that > much data in my node. > > Also how does Cassandra calculate the pending tasks with LCS? > > Why are there a few such big SSTables in the active sstable list? Is it > because LCS falls back to STCS if L0 is too full? Should I use the > stcs_im_l0:false option? What will happen to these big sstables in the future? > > I'm currently just waiting for the compactions to eventually finish, but I'm > hoping to learn in more detail what the system does and possibly to help > similar migration in the future. > > Thanks, > > - Garo > > > >