[jira] [Commented] (CASSANDRA-20092) SSTableScanner can be vastly simplified for compaction

Jon Haddad (Jira) Wed, 16 Apr 2025 08:18:12 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-20092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17945108#comment-17945108
 ]


Jon Haddad commented on CASSANDRA-20092:
----------------------------------------

Now that 15452 is in 5.0, I took a look at the IO during compaction.  At least 
in the one I examined, I see that relatively, Statistics.db now takes up ~30% 
of compaction reads:
{noformat}
14:40:29 CompactionExec 1782   R 4096    0           0.00 nb-3-big-Statistics.db
14:40:29 CompactionExec 1782   R 701     4           0.00 nb-3-big-Statistics.db
14:40:29 CompactionExec 1782   R 4096    0           0.00 nb-4-big-Statistics.db
14:40:29 CompactionExec 1782   R 4096    4           0.00 nb-4-big-Statistics.db
14:40:29 CompactionExec 1782   R 1962    8           0.00 nb-4-big-Statistics.db
14:40:29 CompactionExec 1782   R 262144  0           0.07 nb-4-big-Data.db
14:40:29 CompactionExec 1782   R 2115    0           0.01 nb-3-big-Data.db
14:40:29 CompactionExec 1782   R 262144  256         0.07 nb-4-big-Data.db
14:40:29 CompactionExec 1782   R 262144  512         0.06 nb-4-big-Data.db
14:40:29 CompactionExec 1782   R 262144  768         0.06 nb-4-big-Data.db
14:40:29 CompactionExec 1782   R 262144  1024        0.07 nb-4-big-Data.db
14:40:29 CompactionExec 1782   R 262144  1280        0.07 nb-4-big-Data.db
14:40:29 CompactionExec 1782   R 262144  1536        0.07 nb-4-big-Data.db
14:40:29 CompactionExec 1782   R 241123  1792        0.07 
nb-4-big-Data.db{noformat}
I'm excited to see this come to 5.0, thanks everyone for the great work on this.

> SSTableScanner can be vastly simplified for compaction
> ------------------------------------------------------
>
>                 Key: CASSANDRA-20092
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20092
>             Project: Apache Cassandra
>          Issue Type: Improvement
>          Components: Local/Compaction
>            Reporter: Branimir Lambov
>            Assignee: Branimir Lambov
>            Priority: Normal
>             Fix For: 5.1
>
>         Attachments: ci_summary_thelastpickle_mck-20092-5.0_154.html, 
> results_details_thelastpickle_mck-20092-5.0_154.tar.xz
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> One of the main bottlenecks for compaction performance is its use of the 
> {{SSTableScanner}} class, whose main purpose is to implement partition range 
> queries and as such supports filtering by row and column that is not helpful 
> to compaction. To implement the latter it must rely on the sstable's index, 
> adding a lot of complexity and inefficiency.
> Implementing a simpler version of a scanner that reads off the data file 
> directly for given spans of offsets would speed up compaction significantly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-20092) SSTableScanner can be vastly simplified for compaction

Reply via email to