[ https://issues.apache.org/jira/browse/CASSANDRA-20092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17945108#comment-17945108 ]
Jon Haddad commented on CASSANDRA-20092: ---------------------------------------- Now that 15452 is in 5.0, I took a look at the IO during compaction. At least in the one I examined, I see that relatively, Statistics.db now takes up ~30% of compaction reads: {noformat} 14:40:29 CompactionExec 1782 R 4096 0 0.00 nb-3-big-Statistics.db 14:40:29 CompactionExec 1782 R 701 4 0.00 nb-3-big-Statistics.db 14:40:29 CompactionExec 1782 R 4096 0 0.00 nb-4-big-Statistics.db 14:40:29 CompactionExec 1782 R 4096 4 0.00 nb-4-big-Statistics.db 14:40:29 CompactionExec 1782 R 1962 8 0.00 nb-4-big-Statistics.db 14:40:29 CompactionExec 1782 R 262144 0 0.07 nb-4-big-Data.db 14:40:29 CompactionExec 1782 R 2115 0 0.01 nb-3-big-Data.db 14:40:29 CompactionExec 1782 R 262144 256 0.07 nb-4-big-Data.db 14:40:29 CompactionExec 1782 R 262144 512 0.06 nb-4-big-Data.db 14:40:29 CompactionExec 1782 R 262144 768 0.06 nb-4-big-Data.db 14:40:29 CompactionExec 1782 R 262144 1024 0.07 nb-4-big-Data.db 14:40:29 CompactionExec 1782 R 262144 1280 0.07 nb-4-big-Data.db 14:40:29 CompactionExec 1782 R 262144 1536 0.07 nb-4-big-Data.db 14:40:29 CompactionExec 1782 R 241123 1792 0.07 nb-4-big-Data.db{noformat} I'm excited to see this come to 5.0, thanks everyone for the great work on this. > SSTableScanner can be vastly simplified for compaction > ------------------------------------------------------ > > Key: CASSANDRA-20092 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20092 > Project: Apache Cassandra > Issue Type: Improvement > Components: Local/Compaction > Reporter: Branimir Lambov > Assignee: Branimir Lambov > Priority: Normal > Fix For: 5.1 > > Attachments: ci_summary_thelastpickle_mck-20092-5.0_154.html, > results_details_thelastpickle_mck-20092-5.0_154.tar.xz > > Time Spent: 40m > Remaining Estimate: 0h > > One of the main bottlenecks for compaction performance is its use of the > {{SSTableScanner}} class, whose main purpose is to implement partition range > queries and as such supports filtering by row and column that is not helpful > to compaction. To implement the latter it must rely on the sstable's index, > adding a lot of complexity and inefficiency. > Implementing a simpler version of a scanner that reads off the data file > directly for given spans of offsets would speed up compaction significantly. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org