[
https://issues.apache.org/jira/browse/CASSANDRA-20820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18040393#comment-18040393
]
Andy Tolbert commented on CASSANDRA-20820:
------------------------------------------
Thanks for taking this on [~alanwang599]!
[~blambov], would appreciate your feedback on this, there is a PR for this
here: https://github.com/apache/cassandra/pull/4476. I'm hoping to give it a
review and a try myself soon.
[CASSANDRA-21041] would be incredibly useful I think, especially if coming from
LCS, knowing overlap between SSTables would be a good way to possibly detecting
when we are getting behind when using leveled scaling parameters.
> Include Level information for UnifiedCompactionStrategy in nodetool
> tablestats output
> -------------------------------------------------------------------------------------
>
> Key: CASSANDRA-20820
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20820
> Project: Apache Cassandra
> Issue Type: Improvement
> Components: Tool/nodetool
> Reporter: Andy Tolbert
> Assignee: Alan Wang
> Priority: Normal
> Time Spent: 10m
> Remaining Estimate: 0h
>
> When using {{LeveledCompactionStrategy}} compaction on a table {{tablestats}}
> currently provides per level data:
> {noformat}
> Keyspace : foo
> ...
> Table: bar
> ...
> SSTables in each level: [6, 20/10, 194/100, 862, 0, 0, 0, 0,
> 0]
> SSTable bytes in each level: [103.91 MiB, 3 GiB, 30.15 GiB,
> 136.28 GiB, 0 bytes, 0 bytes, 0 bytes, 0 bytes, 0 bytes]
> {noformat}
> This is really useful information as it helps an operator understand whether
> L0 is getting backed up, and whether higher levels have their expected 10,
> 100, 1000, etc. targets.
> As {{UnifiedCompactionStrategy}} dynamically places SSTables in levels based
> on their density, it would also be useful for an operator to know the
> distribution of their SSTables between levels and stats about SSTables within
> their levels.
> I have a proof of concept that I'm working on ([slack
> thread|https://the-asf.slack.com/archives/CJZLTM05A/p1754248321995119]) that
> adds this information by using UCS {{formLevels}} method to get the
> distribution of SSTables in their associated levels. The output currently
> looks like this:
> {noformat}
> SSTables in each level: [0, 6, 15, 165, 3]
> SSTable bytes in each level: [0 bytes, 1.04 GiB, 2.69 GiB, 67.85 GiB, 1.67
> GiB]
> SSTable Average token space in each level: [0.000, 0.500, 0.083, 0.014, 0.008]
> SSTable Average vs Allowed Max Density Ratio in each level: [0.00, 0.73,
> 0.36, 0.65, 0.10]
> SSTable Max vs Allowed Max Density Ratio in each level: [0.00, 0.97, 0.99,
> 1.00, 0.10]
> {noformat}
> This also includes 'average token space per level', which is useful for
> understanding how much of a token range an SSTable covers on average, which
> is helpful for ascertaining how much anticompaction may need to be done if
> incrementally repairing this data.
> Showing the ratio of SSTable densities vs max allowed density in that level
> helps an operator understand how close they are to accumulating sstables into
> a new level.
> I would also like to include:
> * 'Average SSTable size in each level': Given UCS has min and target sstable
> sizes, its useful for an operator to know how their sstables are being sized,
> and they should be mostly uniform by level.
> * 'Shard count in each level': How many shards are assigned to the level.
> I'm not sure if this is feasible yet, but would be nice to see.
> Some other notes:
> * When using Incremental Repair, SSTables being divided into repaired and
> unrepaired sets tends to skew this data for both LCS and UCS. I'd like to
> separate the metrics out by these repaired sets.
> * What i'm proposing is adding quite a bit of output to tablestats, so need
> to evaluate whether we can make this concise enough to include, or if the
> data should be exposed some other way.
> Given I am still new to UCS, I'll likely iterate a bit on this. Would
> appreciate feedback/suggestions.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]