RE: Query around Data Modelling -2

Michiel Saelen Thu, 30 Jun 2022 18:22:15 -0700

Hi,

We did do compaction job every week in the past to keep the disk space used 
under control as we had mainly data in the table that needs to expire with TTL 
and were also using levelled compaction.
In our case we had different TTL’s in the same table and the partitions were 
spread over multiple ssTables, as the partitions were never closing and 
therefor kept on pushing changes we ended up with repair actions that had to 
cover a lot of ssTables which is heavy on memory and CPU.
By changing the compaction strategy to 
TWCS<https://cassandra.apache.org/doc/latest/cassandra/operating/compaction/twcs.html>,
 splitting the table into different tables with their own TTL and adding a part 
to the partition key (e.g. the day of the year) to close the partitions, so 
they can be “marked” as repaired, we were able to get rid of these heavy 
compaction actions.


Not sure if you have the same use case, just wanted to share this info.

Kind regards,
Michiel

[cid:image001.png@01D88D2B.263669C0]<https://skyline.be/jobs/en>


Michiel Saelen | Principal Solution Architect
Email michiel.sae...@skyline.be<mailto:michiel.sae...@skyline.be>

Skyline Communications
39 Hong Kong Street #02-01 | Singapore 059678
www.skyline.be<https://www.skyline.be> | +65 6920 1145<tel:+6569201145>

[cid:image002.png@01D88D2B.263669C0]<https://skyline.be/>


[cid:image003.png@01D88D2B.263669C0]<https://teams.microsoft.com/l/chat/0/0?users=michiel.sae...@skyline.be>
[cid:image004.png@01D88D2B.263669C0]<https://community.dataminer.services/?utm_source=signature&utm_medium=email&utm_campaign=icon>
[cid:image005.png@01D88D2B.263669C0]<https://www.linkedin.com/company/skyline-communications>
[cid:image006.png@01D88D2B.263669C0]<https://www.youtube.com/user/SkylineCommu>
[cid:image007.png@01D88D2B.263669C0]<https://www.facebook.com/SkylineCommunications/>
[cid:image008.png@01D88D2B.263669C0]<https://www.instagram.com/skyline.dataminer/>
[cid:image009.png@01D88D2B.263669C0]<https://skyline.be/skyline/awards?utm_source=signature&utm_medium=email&utm_campaign=icon>


[cid:image010.png@01D88D2B.263669C0]

From: Bowen Song <bo...@bso.ng>
Sent: Friday, July 1, 2022 08:48
To: user@cassandra.apache.org
Subject: Re: Query around Data Modelling -2

This message was sent from outside the company. Please do not click links or 
open attachments unless you recognise the source of this email and know the 
content is safe.


And why do you do that?
On 30/06/2022 16:35, MyWorld wrote:
We run major compaction once in a week

On Thu, Jun 30, 2022, 8:14 PM Bowen Song <bo...@bso.ng<mailto:bo...@bso.ng>> 
wrote:

I have noticed this "running a weekly repair and compaction job".

What do you mean weekly compaction job? Have you disabled the auto-compaction 
on the table and is relying on weekly scheduled compactions? Or running weekly 
major compactions? Neither of these sounds right.
On 30/06/2022 15:03, MyWorld wrote:
Hi all,

Another query around data Modelling.

We have a existing table with below structure:
Table(PK,CK, col1,col2, col3, col4,col5)

Now each Pk here have 1k - 10k Clustering keys. Each PK has size from 10MB to 
80MB. We have overall 100+ millions partitions. Also we have set levelled 
compactions in place so as to get better read response time.

We are currently on 3.11.x version of Cassandra. On running a weekly repair and 
compaction job, this model because of levelled compaction (occupied till Level 
3) consume heavy cpu resource and impact db performance.

Now what if we divide this table in 10 with each table containing 1/10 
partitions. So now each table will be limited to levelled compaction upto 
level-2. I think this would ease down read as well as compaction task.

What is your opinion on this?
Even if we upgrade to ver 4.0, is the second model ok?

RE: Query around Data Modelling -2

Reply via email to