[jira] [Commented] (CASSANDRA-17021) Enhance Zstd support in Cassandra with dictionaries

Jon Haddad (Jira) Tue, 14 Oct 2025 02:15:14 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-17021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18028461#comment-18028461
 ]


Jon Haddad commented on CASSANDRA-17021:
----------------------------------------

I like the high level approach.  I don't object to a list of SSTables, but I 
don't think it should be required, it seems a bit overkill for regular users.

I don't think you even need to read through the entire SSTable.  Since murmur 
gives us effectively random partition sorting, we could probably get by just 
from reading <some number of chunk> at the head of the table.  How much data do 
we need to sample to be effective?

What do you think?

> Enhance Zstd support in Cassandra with dictionaries
> ---------------------------------------------------
>
>                 Key: CASSANDRA-17021
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17021
>             Project: Apache Cassandra
>          Issue Type: Improvement
>          Components: Feature/Compression
>            Reporter: Dinesh Joshi
>            Assignee: Yifan Cai
>            Priority: Normal
>          Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Currently Cassandra supports zstd compression. However, Zstd also supports 
> dictionaries to enhance not only the compression ratio but also the speed. 
> Dictionaries can show 3-4x savings. We should add support to train 
> dictionaries, ideally per SSTable this will yield the maximum gains.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-17021) Enhance Zstd support in Cassandra with dictionaries

Reply via email to