Inline On Tue, Sep 14, 2021 at 11:47 AM Isaeed Mohanna <isa...@xsense.co> wrote:
> Hi Jeff > > My data is partitioned by a sourceId and metric, a source is usually > active up to a year after which there is no additional writes for the > partition, and reads become scarce, so although this is not an explicit > time component, its time based, will that suffice? > I guess it means that a single read may touch a year of sstables. Not great, but perhaps not fatal. Hopefully your reads avoid that in practice. We'd need the full schema to be very sure (does clustering column include month/day? if so, there are cases where that can help exclude sstables) > > > If I use a week bucket we will be able to serve last few days reads from > one file and last month from ~5 which is the most common queries, do u > think doing a months bucket a good idea? That will allow reading from one > file most of the time but the size of each SSTable will be ~5 times bigger > It'll be 1-4 for most common (up to 4 for same bucket reads because STCS in the first bucket is triggered at min_threshold=4), and 5 max, seems reasonable. Way better than the 200 or so you're doing now. > > > When changing the compaction strategy via JMX, do I need to issue the > alter table command at the end so it will be reflected in the schema or is > it taking care of automatically? (I am using cassandra 3.11.11) > > > At the end, yes. > Thanks a lot for your help. > > > > > > > > > > > > > > > > > > *From:* Jeff Jirsa <jji...@gmail.com> > *Sent:* Tuesday, September 14, 2021 4:51 PM > *To:* cassandra <user@cassandra.apache.org> > *Subject:* Re: TWCS on Non TTL Data > > > > > > > > On Tue, Sep 14, 2021 at 5:42 AM Isaeed Mohanna <isa...@xsense.co> wrote: > > Hi > > I have a table that stores time series data, the data is not TTLed since > we want to retain the data for the foreseeable future, and there are no > updates or deletes. (deletes could happens rarely in case some scrambled > data reached the table, but its extremely rare). > > Usually we do constant write of incoming data to the table ~ 5 milion a > day, mostly newly generated data in the past week, but we also get old data > that got stuck somewhere but not that often. Usually our reads are for the > most recent data last month – three. But we do fetch old data as well in a > specific time period in the past. > > Lately we have been facing performance trouble with this table see > histogram below, When compaction is working on the table the performance > even drops to 10-20 seconds!! > > Percentile SSTables Write Latency Read Latency Partition > Size Cell Count > > (micros) (micros) (bytes) > > 50% 215.00 17.08 89970.66 > 1916 149 > > 75% 446.00 24.60 223875.79 > 2759 215 > > 95% 535.00 35.43 464228.84 > 8239 642 > > 98% 642.00 51.01 668489.53 > 24601 1916 > > 99% 642.00 73.46 962624.93 > 42510 3311 > > Min 0.00 2.30 10090.81 > 43 0 > > Max 770.00 1358.10 2395318.86 > 5839588 454826 > > > > As u can see we are scaning hundreds of sstables, turns out we are using > DTCS (min:4,max32) , the table folder contains ~33K files of ~130GB per > node (cleanup pending after increasing the cluster), And compaction takes a > very long time to complete. > > As I understood DTCS is deprecated so my questions > > 1. should we switch to TWCS even though our data is not TTLed since we > do not do delete at all can we still use it? Will it improve performance? > > It will probably be better than DTCS here, but you'll still have > potentially lots of sstables over time. > > > > Lots of sstables in itself isn't a big deal, the problem comes from > scanning more than a handful on each read. Does your table have some form > of date bucketing to avoid touching old data files? > > > > > > > 1. If we should switch I am thinking of using a time window of a week, > this way the read will scan 10s of sstables instead of hundreds today. Does > it sound reasonable? > > 10s is better than hundreds, but it's still a lot. > > > > > 1. Is there a recommended size of a window bucket in terms of disk > space? > > When I wrote it, I wrote it for a use case that had 30 windows over the > whole set of data. Since then, I've seen it used with anywhere from 5 to 60 > buckets. > > With no TTL, you're effectively doing infinite buckets. So the only way to > ensure you're not touching too many sstables is to put the date (in some > form) into the partition key and let the database use that (+bloom filters) > to avoid reading too many sstables. > > > 1. If TWCS is not a good idea should I switch to STCS instead could > that yield in better performance than current situation? > > LCS will give you better read performance. STCS will probably be better > than DTCS given the 215 sstable p50 you're seeing (which is crazy btw, I'm > surprised you're not just OOMing) > > > > > 1. What are the risk of changing compaction strategy on a production > system, can it be done on the fly? Or its better to go through a full test, > backup cycle? > > > > The risk is you trigger a ton of compactions which drops the performance > of the whole system all at once and your front door queries all time out. > > You can approach this a few ways: > > - Use the JMX endpoint to change compaction on one instance at a time > (rather than doing it in the schema), which lets you control how many nodes > are re-writing all their data at any given point in time > > - You can make an entirely new table, and then populate it by reading from > the old one and writing ot the new one, and then you dont have the massive > compaction kick off > > - You can use user defined compaction to force compact some of those 33k > sstables into fewer sstables in advance, hopefully taking away some of the > pain you're seeing, before you fire off the big compaction > > > > The 3rd hint above - user defined compaction - will make TWCS less > effective, because TWCS uses the max timestamp per sstable for bucketing, > and you'd be merging sstables and losing granularity. > > > > Really though, the main thing you need to do is get a time component in > your partition key so you avoid scanning every sstable looking for data, > either that or bite the bullet and use LCS so the compaction system keeps > it at a manageable level for reads. > > > > > > > 1. > > All input will be appreciated, > > Thank you > >