I don't recall myself ever seen any recommendation on periodically running major compactions. Can you share the source of your information?

During the major compaction, the server will be under heavy load, and it will need to rewrite ALL sstables. This actually hurts the read performance while the compaction is running.

The most important factor of read performance is the amount of data each node has to scan in order to complete the read query. Large partitions, too many tombstones, partition spread in too many sstables, etc. all hurts the performance. You will need to find the bottleneck and act on it in order to improve read performance.

Artificially spreading the data from one LCS table into many tables with identical schema is not likely to improve the read performance. The only benefit you get is more compaction parallelisation, and that may further hurt the read performance if the bottleneck is CPU usage, disk IO, or GC.

If you know the table is heavily read, and you have a performance issue with that, maybe it's time to redesign the table schema and optimise for the most frequently used read queries.

On 01/07/2022 11:29, MyWorld wrote:

 Michiel, This is not in our use case. Since our data is not time series, there is no TTL in our case.

Bowen, I think this is what is generally recommend to run a major compaction once in a week for better read performance.

On Fri, Jul 1, 2022, 6:52 AM Michiel Saelen <michiel.sae...@skyline.be> wrote:

    Hi,

    We did do compaction job every week in the past to keep the disk
    space used under control as we had mainly data in the table that
    needs to expire with TTL and were also using levelled compaction.

    In our case we had different TTL’s in the same table and the
    partitions were spread over multiple ssTables, as the partitions
    were never closing and therefor kept on pushing changes we ended
    up with repair actions that had to cover a lot of ssTables which
    is heavy on memory and CPU.
    By changing the compaction strategy to TWCS
    
<https://cassandra.apache.org/doc/latest/cassandra/operating/compaction/twcs.html>,
    splitting the table into different tables with their own TTL and
    adding a part to the partition key (e.g. the day of the year) to
    close the partitions, so they can be “marked” as repaired, we were
    able to get rid of these heavy compaction actions.

    Not sure if you have the same use case, just wanted to share this
    info.

    Kind regards,

    Michiel

    <https://skyline.be/jobs/en>

        

        
        

        

    *Michiel Saelen *|Principal Solution Architect

    Email michiel.sae...@skyline.be <mailto:michiel.sae...@skyline.be>



    Skyline Communications

    39 Hong Kong Street #02-01 |Singapore 059678
    www.skyline.be <https://www.skyline.be>|+65 6920 1145
    <tel:+6569201145>

    <https://skyline.be/>

        

        
        

        

    <https://teams.microsoft.com/l/chat/0/0?users=michiel.sae...@skyline.be>

        
        

    
<https://community.dataminer.services/?utm_source=signature&utm_medium=email&utm_campaign=icon>

        
        

    <https://www.linkedin.com/company/skyline-communications>

        
        

    <https://www.youtube.com/user/SkylineCommu>

        
        

    <https://www.facebook.com/SkylineCommunications/>

        
        

    <https://www.instagram.com/skyline.dataminer/>

        
        

    
<https://skyline.be/skyline/awards?utm_source=signature&utm_medium=email&utm_campaign=icon>

        

    *From:* Bowen Song <bo...@bso.ng>
    *Sent:* Friday, July 1, 2022 08:48
    *To:* user@cassandra.apache.org
    *Subject:* Re: Query around Data Modelling -2


        

    This message was sent from outside the company. Please do not
    click links or open attachments unless you recognise the source of
    this email and know the content is safe.

    And why do you do that?

    On 30/06/2022 16:35, MyWorld wrote:

        We run major compaction once in a week

        On Thu, Jun 30, 2022, 8:14 PM Bowen Song <bo...@bso.ng> wrote:

            I have noticed this "running a weekly repair and
            compaction job".

            What do you mean weekly compaction job? Have you disabled
            the auto-compaction on the table and is relying on weekly
            scheduled compactions? Or running weekly major
            compactions? Neither of these sounds right.

            On 30/06/2022 15:03, MyWorld wrote:

                Hi all,

                Another query around data Modelling.

                We have a existing table with below structure:

                Table(PK,CK, col1,col2, col3, col4,col5)

                Now each Pk here have 1k - 10k Clustering keys. Each
                PK has size from 10MB to 80MB. We have overall 100+
                millions partitions. Also we have set levelled
                compactions in place so as to get better read response
                time.

                We are currently on 3.11.x version of Cassandra. On
                running a weekly repair and compaction job, this model
                because of levelled compaction (occupied till Level 3)
                consume heavy cpu resource and impact db performance.

                Now what if we divide this table in 10 with each table
                containing 1/10 partitions. So now each table will be
                limited to levelled compaction upto level-2. I think
                this would ease down read as well as compaction task.

                What is your opinion on this?

                Even if we upgrade to ver 4.0, is the second model ok?

Reply via email to