Re: TWCS on Non TTL Data

Jeff Jirsa Tue, 14 Sep 2021 12:35:15 -0700

Inline

On Tue, Sep 14, 2021 at 11:47 AM Isaeed Mohanna <isa...@xsense.co> wrote:


> Hi Jeff
>
> My data is partitioned by a sourceId and metric, a source is usually
> active up to a year after which there is no additional writes for the
> partition, and reads become scarce, so although this is not an explicit
> time component, its time based, will that suffice?
>

I guess it means that a single read may touch a year of sstables. Not
great, but perhaps not fatal. Hopefully your reads avoid that in practice.
We'd need the full schema to be very sure (does clustering column include
month/day? if so, there are cases where that can help exclude sstables)


>
>
> If I use a  week bucket we will be able to serve last few days reads from
> one file and last month from ~5 which is the most common queries, do u
> think doing a months bucket a good idea? That will allow reading from one
> file most of the time but the size of each SSTable will be ~5 times bigger
>

It'll be 1-4 for most common (up to 4 for same bucket reads because STCS in
the first bucket is triggered at min_threshold=4), and 5 max, seems
reasonable. Way better than the 200 or so you're doing now.


>
>
> When changing the compaction strategy via JMX, do I need to issue the
> alter table command at the end so it will be reflected in the schema or is
> it taking care of automatically? (I am using cassandra 3.11.11)
>
>
>

At the end, yes.


> Thanks a lot for your help.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* Jeff Jirsa <jji...@gmail.com>
> *Sent:* Tuesday, September 14, 2021 4:51 PM
> *To:* cassandra <user@cassandra.apache.org>
> *Subject:* Re: TWCS on Non TTL Data
>
>
>
>
>
>
>
> On Tue, Sep 14, 2021 at 5:42 AM Isaeed Mohanna <isa...@xsense.co> wrote:
>
> Hi
>
> I have a table that stores time series data, the data is not TTLed since
> we want to retain the data for the foreseeable future, and there are no
> updates or deletes. (deletes could happens rarely in case some scrambled
> data reached the table, but its extremely rare).
>
> Usually we do constant write of incoming data to the table ~ 5 milion a
> day, mostly newly generated data in the past week, but we also get old data
> that got stuck somewhere but not that often. Usually our reads are for the
> most recent data last month – three. But we do fetch old data as well in a
> specific time period in the past.
>
> Lately we have been facing performance trouble with this table see
> histogram below, When compaction is working on the table the performance
> even drops to 10-20 seconds!!
>
> Percentile  SSTables     Write Latency      Read Latency    Partition
> Size        Cell Count
>
>                               (micros)          (micros)           (bytes)
>
> 50%           215.00             17.08          89970.66
> 1916               149
>
> 75%           446.00             24.60         223875.79
> 2759               215
>
> 95%           535.00             35.43         464228.84
> 8239               642
>
> 98%           642.00             51.01         668489.53
> 24601              1916
>
> 99%           642.00             73.46         962624.93
> 42510              3311
>
> Min             0.00              2.30          10090.81
> 43                 0
>
> Max           770.00           1358.10        2395318.86
> 5839588            454826
>
>
>
> As u can see we are scaning hundreds of sstables, turns out we are using
> DTCS  (min:4,max32) , the table folder contains ~33K files  of ~130GB per
> node (cleanup pending after increasing the cluster), And compaction takes a
> very long time to complete.
>
> As I understood DTCS is deprecated so my questions
>
>    1. should we switch to TWCS even though our data is not TTLed since we
>    do not do delete at all can we still use it? Will it improve performance?
>
> It will probably be better than DTCS here, but you'll still have
> potentially lots of sstables over time.
>
>
>
> Lots of sstables in itself isn't a big deal, the problem comes from
> scanning more than a handful on each read. Does your table have some form
> of date bucketing to avoid touching old data files?
>
>
>
>
>
>
>    1. If we should switch I am thinking of using a time window of a week,
>    this way the read will scan 10s of sstables instead of hundreds today. Does
>    it sound reasonable?
>
> 10s is better than hundreds, but it's still a lot.
>
>
>
>
>    1. Is there a recommended size of a window bucket in terms of disk
>    space?
>
> When I wrote it, I wrote it for a use case that had 30 windows over the
> whole set of data. Since then, I've seen it used with anywhere from 5 to 60
> buckets.
>
> With no TTL, you're effectively doing infinite buckets. So the only way to
> ensure you're not touching too many sstables is to put the date (in some
> form) into the partition key and let the database use that (+bloom filters)
> to avoid reading too many sstables.
>
>
>    1. If TWCS is not a good idea should I switch to STCS instead could
>    that yield in better performance than current situation?
>
> LCS will give you better read performance. STCS will probably be better
> than DTCS given the 215 sstable p50 you're seeing (which is crazy btw, I'm
> surprised you're not just OOMing)
>
>
>
>
>    1. What are the risk of changing compaction strategy on a production
>    system, can it be done on the fly? Or its better to go through a full test,
>    backup cycle?
>
>
>
> The risk is you trigger a ton of compactions which drops the performance
> of the whole system all at once and your front door queries all time out.
>
> You can approach this a few ways:
>
> - Use the JMX endpoint to change compaction on one instance at a time
> (rather than doing it in the schema), which lets you control how many nodes
> are re-writing all their data at any given point in time
>
> - You can make an entirely new table, and then populate it by reading from
> the old one and writing ot the new one, and then you dont have the massive
> compaction kick off
>
> - You can use user defined compaction to force compact some of those 33k
> sstables into fewer sstables in advance, hopefully taking away some of the
> pain you're seeing, before you fire off the big compaction
>
>
>
> The 3rd hint above - user defined compaction - will make TWCS less
> effective, because TWCS uses the max timestamp per sstable for bucketing,
> and you'd be merging sstables and losing granularity.
>
>
>
> Really though, the main thing you need to do is get a time component in
> your partition key so you avoid scanning every sstable looking for data,
> either that or bite the bullet and use LCS so the compaction system keeps
> it at a manageable level for reads.
>
>
>
>
>
>
>    1.
>
> All input will be appreciated,
>
> Thank you
>
>

Re: TWCS on Non TTL Data

Reply via email to