Re: TWCS on Non TTL Data

Bowen Song Mon, 20 Sep 2021 05:04:34 -0700

By switching to this method, you will have two tables - the short-termstorage table and the permanent storage table. The short-term table willuse TWCS and have a TTL on it. The permanent table doesn't have TTL, andit will use STCS or LCS instead of TWCS. Whenever the application write(insert or update) anything, it should always write to both tables. Whenthe application wants to read from the table, it may choose to read fromeither the short-term or permanent table, depending on the expectedwritten time of the data.


On 19/09/2021 08:55, Isaeed Mohanna wrote:

The point is that I am NOT using TTL and I need to keep the data, sowhen I do the switch to TWCS, will the old files be recompacted orthey will remain the same and only new data coming in will use TWCS?


*From:* Bowen Song <bo...@bso.ng>
*Sent:* Friday, September 17, 2021 9:04 PM
*To:* user@cassandra.apache.org
*Subject:* Re: TWCS on Non TTL Data

If you use TWCS with TTL, the old SSTables won't be compacted, theentire SSTable file will get dropped after it expires. I don't thinkyou will need to manage the compaction or cleanup at all, as they areautomatic. There's no space limit on the table holding the near-termdata other than the overall free disk space. There's only a time limiton that table.


On 17/09/2021 16:51, Isaeed Mohanna wrote:

    Thank for the help,

    How does the compaction run? Does it clean old compaction files
    while running or only at the end, I want to manage the free space
    so not run out while its running?

    *From:* Jim Shaw <jxys...@gmail.com> <mailto:jxys...@gmail.com>
    *Sent:* Wednesday, September 15, 2021 3:49 PM
    *To:* user@cassandra.apache.org
    *Subject:* Re: TWCS on Non TTL Data

    You may try roll up the data, i.e.  a table only 1 month data, old
    data roll up to a table keep a year data.

    Thanks,

    Jim

    On Wed, Sep 15, 2021 at 1:26 AM Isaeed Mohanna <isa...@xsense.co>
    wrote:

        My cluster column is the time series timestamp, so basically
        sourceId, metric type for partition key and timestamp for the
        clustering key the rest of the fields are just values outside
        of the primary key. Our reads request are simply give me
        values for a time range of a specific sourceId,Metric
        combination. So I am guess that during read the sstables that
        contain the partition key will be found and out of those the
        ones that are out of the range will be excluded, correct?

        In practice our queries are up to a month by default, only
        rarely we fetch more when someone is exporting the data or so.

        In reality also we get old data, that is a source will send
        its information late instead of sending it in realtime it will
        send all last month\week\day data at once, in that case I
        guess the data will end up in current bucket, will that affect
        performance?

        Assuming I start with a  1 week bucket, I could later change
        the time window right?

        Thanks

        *From:* Jeff Jirsa <jji...@gmail.com>
        *Sent:* Tuesday, September 14, 2021 10:35 PM
        *To:* cassandra <user@cassandra.apache.org>
        *Subject:* Re: TWCS on Non TTL Data

        Inline

        On Tue, Sep 14, 2021 at 11:47 AM Isaeed Mohanna
        <isa...@xsense.co> wrote:

            Hi Jeff

            My data is partitioned by a sourceId and metric, a source
            is usually active up to a year after which there is no
            additional writes for the partition, and reads become
            scarce, so although this is not an explicit time
            component, its time based, will that suffice?

        I guess it means that a single read may touch a year of
        sstables. Not great, but perhaps not fatal. Hopefully your
        reads avoid that in practice. We'd need the full schema to be
        very sure (does clustering column include month/day? if so,
        there are cases where that can help exclude sstables)

            If I use a  week bucket we will be able to serve last few
            days reads from one file and last month from ~5 which is
            the most common queries, do u think doing a months bucket
            a good idea? That will allow reading from one file most of
            the time but the size of each SSTable will be ~5 times bigger

        It'll be 1-4 for most common (up to 4 for same bucket reads
        because STCS in the first bucket is triggered at
        min_threshold=4), and 5 max, seems reasonable. Way better than
        the 200 or so you're doing now.

            When changing the compaction strategy via JMX, do I need
            to issue the alter table command at the end so it will be
            reflected in the schema or is it taking care of
            automatically? (I am using cassandra 3.11.11)

        At the end, yes.

            Thanks a lot for your help.

            *From:* Jeff Jirsa <jji...@gmail.com>
            *Sent:* Tuesday, September 14, 2021 4:51 PM
            *To:* cassandra <user@cassandra.apache.org>
            *Subject:* Re: TWCS on Non TTL Data

            On Tue, Sep 14, 2021 at 5:42 AM Isaeed Mohanna
            <isa...@xsense.co> wrote:

                Hi

                I have a table that stores time series data, the data
                is not TTLed since we want to retain the data for the
                foreseeable future, and there are no updates or
                deletes. (deletes could happens rarely in case some
                scrambled data reached the table, but its extremely rare).

                Usually we do constant write of incoming data to the
                table ~ 5 milion a day, mostly newly generated data in
                the past week, but we also get old data that got stuck
                somewhere but not that often. Usually our reads are
                for the most recent data last month – three. But we do
                fetch old data as well in a specific time period in
                the past.

                Lately we have been facing performance trouble with
                this table see histogram below, When compaction is
                working on the table the performance even drops to
                10-20 seconds!!

                Percentile SSTables     Write Latency Read Latency   
                Partition Size        Cell Count

                (micros) (micros)           (bytes)

                50% 215.00 17.08 89970.66 1916               149

                75% 446.00             24.60 223875.79
                2759               215

                95% 535.00             35.43 464228.84
                8239               642

                98% 642.00             51.01 668489.53
                24601              1916

                99% 642.00             73.46 962624.93
                42510              3311

                Min 0.00              2.30 10090.81 43                 0

                Max 770.00           1358.10 2395318.86
                5839588            454826

                As u can see we are scaning hundreds of sstables,
                turns out we are using DTCS  (min:4,max32) , the table
                folder contains ~33K files  of ~130GB per node
                (cleanup pending after increasing the cluster), And
                compaction takes a very long time to complete.

                As I understood DTCS is deprecated so my questions

                 1. should we switch to TWCS even though our data is
                    not TTLed since we do not do delete at all can we
                    still use it? Will it improve performance?

            It will probably be better than DTCS here, but you'll
            still have potentially lots of sstables over time.

            Lots of sstables in itself isn't a big deal, the problem
            comes from scanning more than a handful on each read. Does
            your table have some form of date bucketing to avoid
            touching old data files?

                 1. If we should switch I am thinking of using a time
                    window of a week, this way the read will scan 10s
                    of sstables instead of hundreds today. Does it
                    sound reasonable?

            10s is better than hundreds, but it's still a lot.

                 1. Is there a recommended size of a window bucket in
                    terms of disk space?

            When I wrote it, I wrote it for a use case that had 30
            windows over the whole set of data. Since then, I've seen
            it used with anywhere from 5 to 60 buckets.

            With no TTL, you're effectively doing infinite buckets. So
            the only way to ensure you're not touching too many
            sstables is to put the date (in some form) into the
            partition key and let the database use that (+bloom
            filters) to avoid reading too many sstables.

                 1. If TWCS is not a good idea should I switch to STCS
                    instead could that yield in better performance
                    than current situation?

            LCS will give you better read performance. STCS will
            probably be better than DTCS given the 215 sstable p50
            you're seeing (which is crazy btw, I'm surprised you're
            not just OOMing)

                 1. What are the risk of changing compaction strategy
                    on a production system, can it be done on the fly?
                    Or its better to go through a full test, backup cycle?

            The risk is you trigger a ton of compactions which drops
            the performance of the whole system all at once and your
            front door queries all time out.

            You can approach this a few ways:

            - Use the JMX endpoint to change compaction on one
            instance at a time (rather than doing it in the schema),
            which lets you control how many nodes are re-writing all
            their data at any given point in time

            - You can make an entirely new table, and then populate it
            by reading from the old one and writing ot the new one,
            and then you dont have the massive compaction kick off

            - You can use user defined compaction to force compact
            some of those 33k sstables into fewer sstables in advance,
            hopefully taking away some of the pain you're seeing,
            before you fire off the big compaction

            The 3rd hint above - user defined compaction - will make
            TWCS less effective, because TWCS uses the max timestamp
            per sstable for bucketing, and you'd be merging sstables
            and losing granularity.

            Really though, the main thing you need to do is get a time
            component in your partition key so you avoid scanning
            every sstable looking for data, either that or bite the
            bullet and use LCS so the compaction system keeps it at a
            manageable level for reads.

                1.

                All input will be appreciated,

                Thank you

Re: TWCS on Non TTL Data

Reply via email to