Teijo, Unfortunately my data set really does grow because it s a time series. I'm going to add a trick to aggregate old data but it will still grow.
How often do you repair per day (or is it really continuous ?) I've been running experiments and I wonder if your decision to perform continuous repairs may not stem from what I observe : I emptied a keyspace and started loading data into it (about 18.000 mutations/s). Every time I run a repair on that keyspace I get out of sync ranges. I just don't see how that is possible given that - none of the nodes are going down - tpstats shows only occasional backlog on the nodes (up to 2000 pending max) Even weirder : when not writing to the keyspace, it took 4 consecutive repairs to not have any out of sync ranges anymore. Is repair probabilistic ? My CFs are created on the following template create column family PUBLIC_MONTHLY_20 with column_type = Super with comparator = UTF8Type with subcomparator = BytesType and min_compaction_threshold=2 and read_repair_chance=0 and keys_cached = 20 and rows_cached = 50 and default_validation_class = CounterColumnType and replicate_on_write=true; Philippe 2011/8/16 Teijo Holzer <thol...@wetafx.co.nz> > Hi, > > we have come across this as well. We run continuously run rolling repairs > followed by major compactions followed by a gc() (or node restart) to get > rid of all these sstables files. Combined with aggressive ttls on most > inserts, the cluster stays nice and lean. > > You don't want your working set to grow indefinitely. > > Cheers, > > T. > > > > On 16/08/11 08:08, Philippe wrote: > >> Forgot to mention that stopping & restarting the server brought the data >> directory down to 283GB in less than 1 minute. >> >> Philippe >> 2011/8/15 Philippe <watche...@gmail.com <mailto:watche...@gmail.com>> >> >> >> It's another reason to avoid major / manual compactions which >> create a >> single big SSTable. Minor compactions keep things in buckets >> which >> means newer SSTable can be compacted needing to read the bigger >> older >> tables. >> >> I've never run a major/manual compaction on this ring. >> In my case running repair on a "big" keyspace results in SSTables >> piling >> up. My problematic node just filled up 483GB (yes, GB) of SSTTables. >> Here >> are the biggest >> ls -laSrh >> (...) >> >> -rw-r--r-- 1 cassandra cassandra 2.7G 2011-08-15 14:13 >> PUBLIC_MONTHLY_20-g-4581-Data.**db >> >> -rw-r--r-- 1 cassandra cassandra 2.7G 2011-08-15 14:52 >> PUBLIC_MONTHLY_20-g-4641-Data.**db >> >> -rw-r--r-- 1 cassandra cassandra 2.8G 2011-08-15 14:39 >> PUBLIC_MONTHLY_20-tmp-g-4878-**Data.db >> >> -rw-r--r-- 1 cassandra cassandra 2.9G 2011-08-15 15:00 >> PUBLIC_MONTHLY_20-g-4656-Data.**db >> >> -rw-r--r-- 1 cassandra cassandra 3.0G 2011-08-15 14:17 >> PUBLIC_MONTHLY_20-g-4599-Data.**db >> >> -rw-r--r-- 1 cassandra cassandra 3.0G 2011-08-15 15:11 >> PUBLIC_MONTHLY_20-g-4675-Data.**db >> >> -rw-r--r-- 3 cassandra cassandra 3.1G 2011-08-13 10:34 >> PUBLIC_MONTHLY_18-g-3861-Data.**db >> >> -rw-r--r-- 1 cassandra cassandra 3.2G 2011-08-15 14:41 >> PUBLIC_MONTHLY_20-tmp-g-4884-**Data.db >> >> -rw-r--r-- 1 cassandra cassandra 3.6G 2011-08-15 14:44 >> PUBLIC_MONTHLY_20-tmp-g-4894-**Data.db >> >> -rw-r--r-- 1 cassandra cassandra 3.8G 2011-08-15 14:56 >> PUBLIC_MONTHLY_20-tmp-g-4934-**Data.db >> >> -rw-r--r-- 1 cassandra cassandra 3.8G 2011-08-15 14:46 >> PUBLIC_MONTHLY_20-tmp-g-4905-**Data.db >> >> -rw-r--r-- 1 cassandra cassandra 4.0G 2011-08-15 14:57 >> PUBLIC_MONTHLY_20-tmp-g-4935-**Data.db >> >> -rw-r--r-- 3 cassandra cassandra 5.9G 2011-08-13 12:53 >> PUBLIC_MONTHLY_19-g-4219-Data.**db >> >> -rw-r--r-- 3 cassandra cassandra 6.0G 2011-08-13 13:57 >> PUBLIC_MONTHLY_20-g-4538-Data.**db >> >> -rw-r--r-- 3 cassandra cassandra 12G 2011-08-13 09:27 >> PUBLIC_MONTHLY_20-g-4501-Data.**db >> >> >> On the other nodes the same directory is around 69GB. Why are there so >> fewer large files there and so many big ones on the repairing node ? >> -rw-r--r-- 1 cassandra cassandra 434M 2011-08-15 16:02 >> PUBLIC_MONTHLY_17-g-3525-Data.**db >> -rw-r--r-- 1 cassandra cassandra 456M 2011-08-15 15:50 >> PUBLIC_MONTHLY_19-g-4253-Data.**db >> -rw-r--r-- 1 cassandra cassandra 485M 2011-08-15 14:30 >> PUBLIC_MONTHLY_20-g-5280-Data.**db >> -rw-r--r-- 1 cassandra cassandra 572M 2011-08-15 15:15 >> PUBLIC_MONTHLY_18-g-3774-Data.**db >> -rw-r--r-- 2 cassandra cassandra 664M 2011-08-09 15:39 >> PUBLIC_MONTHLY_20-g-4893-**Index.db >> -rw-r--r-- 2 cassandra cassandra 811M 2011-08-11 21:27 >> PUBLIC_MONTHLY_16-g-2597-Data.**db >> -rw-r--r-- 2 cassandra cassandra 915M 2011-08-13 04:00 >> PUBLIC_MONTHLY_18-g-3695-Data.**db >> -rw-r--r-- 1 cassandra cassandra 925M 2011-08-15 03:39 >> PUBLIC_MONTHLY_17-g-3454-Data.**db >> -rw-r--r-- 1 cassandra cassandra 1.3G 2011-08-15 13:46 >> PUBLIC_MONTHLY_19-g-4199-Data.**db >> -rw-r--r-- 2 cassandra cassandra 1.5G 2011-08-10 15:37 >> PUBLIC_MONTHLY_17-g-3218-Data.**db >> -rw-r--r-- 1 cassandra cassandra 1.9G 2011-08-15 14:35 >> PUBLIC_MONTHLY_20-g-5281-Data.**db >> -rw-r--r-- 2 cassandra cassandra 2.1G 2011-08-10 16:33 >> PUBLIC_MONTHLY_19-g-3946-Data.**db >> -rw-r--r-- 2 cassandra cassandra 3.1G 2011-08-10 22:23 >> PUBLIC_MONTHLY_18-g-3509-Data.**db >> -rw-r--r-- 2 cassandra cassandra 4.0G 2011-08-10 18:18 >> PUBLIC_MONTHLY_20-g-5024-Data.**db >> -rw------- 2 cassandra cassandra 5.1G 2011-08-09 15:23 >> PUBLIC_MONTHLY_19-g-3847-Data.**db >> -rw-r--r-- 2 cassandra cassandra 9.6G 2011-08-09 15:39 >> PUBLIC_MONTHLY_20-g-4893-Data.**db >> >> This whole compaction thing is getting me worried : how are sites in >> production dealing with SSTables becoming larger and larger and thus >> taking >> longer and longer to compact ? Adding nodes every couple of weeks ? >> >> Philippe >> >> >> >