[ 
https://issues.apache.org/jira/browse/CASSANDRA-17021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18028460#comment-18028460
 ] 

Yifan Cai commented on CASSANDRA-17021:
---------------------------------------

Regarding the on-disk training, what about adding the option to 
`traincompressiondictionary` to take a list of SSTables or simply a flag to 
pick sstables from a given table and train from them? Essentially in those 
steps,
  - Open each selected SSTable reader
  - Scan through data file sequentially
  - Extract uncompressed chunk data
  - Add chunks as training samples
  - Train immediately when done reading

It would solve the "write once" workload Jon concerned about and give a more 
reliable (maybe more responsive) experience for operators than live sampling. 

Just want to make a quick note that the auto-training feature (yet to be 
implemented) will sample from new flushed data and compacted data and train 
automatically; it should make the UX even smoother. 

> Enhance Zstd support in Cassandra with dictionaries
> ---------------------------------------------------
>
>                 Key: CASSANDRA-17021
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17021
>             Project: Apache Cassandra
>          Issue Type: Improvement
>          Components: Feature/Compression
>            Reporter: Dinesh Joshi
>            Assignee: Yifan Cai
>            Priority: Normal
>          Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Currently Cassandra supports zstd compression. However, Zstd also supports 
> dictionaries to enhance not only the compression ratio but also the speed. 
> Dictionaries can show 3-4x savings. We should add support to train 
> dictionaries, ideally per SSTable this will yield the maximum gains.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to