[
https://issues.apache.org/jira/browse/CASSANDRA-18534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773559#comment-17773559
]
Stefan Miklosovic edited comment on CASSANDRA-18534 at 10/10/23 6:49 AM:
-------------------------------------------------------------------------
I prefer to merge CASSANDRA-18872 first where it will be removed. So it means
that 5.0 will _not_ have crc_check_chance in compression anymore.
Then you might rebase this work against 5.0 where crc_check_chance will not be
in compression anymore and you might tweak FileHandler builder to propagate
sstable format option there to align it.
[~maxwellguo] [~blambov] how do this sound to you?
BTW I think this ticket as a whole needs to have a ML thread. We are changing
CQL here and it would be great to involve more people into this.
What seems to be a little bit "strange" to me is that we chose these properties:
row_index_granularity
bloom_filter_fp_chance
crc_check_chance
min/max_index_interval
But _why exactly these_? Also, what does have crc_check_chance, for example, to
do with _sstable format_. There is no "format" behind that. crc_check_chance
(similarly bloom_filter_fp_chance), is just a _probability_ with which we do so
and so operation. That is an operational parameter, we are not _formatting an
sstable_ as such. Maybe it is just a matter of naming, I just find this to be
important to mention.
Also, do you think it is possible and useful to make sstable_format contain
custom parameters? If we have a way how to specify custom format of an SSTable
by implementing AbstractSSTableFormat, then such format might accept additional
parameters which would be added into sstable_format like this:
{code}
... sstable_format = {"type": "mytype", "myparameter": "abc"}
{code}
That means we would not need to implement every custom parameter out there for
whatever format. The tricky part is that if we allow custom parameters to be
specified, then, on alternation of a schema, it would start to be a different
schema version which would need to be propagated to the cluster.
was (Author: smiklosovic):
I prefer to merge CASSANDRA-18872 first where it will be removed. So it means
that 5.0 will _not_ have crc_check_chance in compression anymore.
Then you might rebase this work against 5.0 where crc_check_chance will not be
in compression anymore and you might tweak FileHandler builder to propagate
sstable format option there to align it.
[~maxwellguo] [~blambov] how do this sound to you?
BTW I think this ticket as a whole needs to have a ML thread. We are changing
CQL here and it would be great to involve more people into this.
What seems to be a little bit "strange" to me is that we chose these properties:
row_index_granularity
bloom_filter_fp_chance
crc_check_chance
min/max_index_interval
But _why exactly these_? Also, what does have a crc_check_chance, for example,
to do with _sstable format_. There is no "format" behind that. crc_check_chance
(similarly bloom_filter_fp_chance), is just a _probability_ with which we do so
and so operation. That is an operational parameter, we are not _formatting an
sstable_ as such. Maybe it is just a matter of naming, I just find this to be
important to mention.
Also, do you think it is possible and useful to make sstable_format contain
custom parameters? If we have a way how to specify custom format of an SSTable
by implementing AbstractSSTableFormat, then such format might accept additional
parameters which would be added into sstable_format like this:
{code}
... sstable_format = {"type": "mytype", "myparameter": "abc"}
{code}
That means we would not need to implement every custom parameter out there for
whatever format. The tricky part is that if we allow custom parameters to be
specified, then, on alternation of a schema, it would start to be a different
schema version which would need to be propagated to the cluster.
> Make sstable format configurable per table
> ------------------------------------------
>
> Key: CASSANDRA-18534
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18534
> Project: Cassandra
> Issue Type: Improvement
> Components: Cluster/Schema, Local/SSTable
> Reporter: Branimir Lambov
> Assignee: Maxwell Guo
> Priority: Normal
> Fix For: 5.x
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Some SSTable format settings need to be configurable per table for better
> efficiency. This includes:
> - {{row_index_granularity}}
> - {{bloom_filter_fp_chance}}
> - {{crc_check_chance}}
> - {{min/max_index_interval}}
> Some of these are currently configurable using direct properties of tables.
> Having them as format properties makes better sense and should also support
> specifying useable combinations of settings, e.g.
> {code:java}
> CREATE TABLE ... WITH sstable_format = "bti-fast";
> CREATE TABLE ... WITH sstable_format = "bti-small";
> {code}
> where {{bti-fast}} and {{bti-small}} can be defined in {{cassandra.yaml}}
> e.g. as
> {code:java}
> sstable.format.options:
> - bti-fast:
> row_index_granularity: 1kiB
> bloom_filter_fp_chance: 0.01
> - bti-small:
> row_index_granularity: 32kiB
> bloom_filter_fp_chance: 0.1
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]