[jira] [Commented] (CASSANDRA-18673) Reduce size of per-SSTable index components

Mike Adamson (Jira) Thu, 20 Jul 2023 09:06:05 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-18673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17745152#comment-17745152
 ]


Mike Adamson commented on CASSANDRA-18673:
------------------------------------------

I have completed some performance runs against this branch and the current CEP 
branch. This loaded 1B rows with the following schema:
{noformat}
create table if not exists TEMPLATE(keyspace,test).TEMPLATE(table,sai) (
          id bigint,
          time timestamp,
          value int,
          lc int,
          tag text,
          PRIMARY KEY (id)
          );
          CREATE CUSTOM INDEX IF NOT EXISTS ON  
TEMPLATE(keyspace:test).TEMPLATE(table:sai) (time) USING 'StorageAttachedIndex';
          CREATE CUSTOM INDEX IF NOT EXISTS ON  
TEMPLATE(keyspace:test).TEMPLATE(table:sai) (value) USING 
'StorageAttachedIndex';
          CREATE CUSTOM INDEX IF NOT EXISTS ON  
TEMPLATE(keyspace:test).TEMPLATE(table:sai) (lc) USING 'StorageAttachedIndex';
          CREATE CUSTOM INDEX IF NOT EXISTS ON  
TEMPLATE(keyspace:test).TEMPLATE(table:sai) (tag) USING 'StorageAttachedIndex';
{noformat}
Data was loaded into the time, value & tag columns.
||Branch||SSTable Size GB||Per-SSTable Index Components GB||Tag Index GB||Time 
Index GB||Value Index GB||SAI Total GB||
|CEP|48|70|2|7|7|87|
|CASSANDRA-18673|48|13|2|7|7|29|

> Reduce size of per-SSTable index components
> -------------------------------------------
>
>                 Key: CASSANDRA-18673
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18673
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Feature/SAI
>            Reporter: Mike Adamson
>            Assignee: Mike Adamson
>            Priority: Urgent
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The current per-SSTable index components are large because the primary keys 
> that are stored in them include the token as part of the byte comparable. The 
> byte comparable puts the token first meaning that we get very little prefix 
> compression from either the trie or the sorted terms store. 
> We can fix this by removing the token from the primary key serialization. 
> This would allow us to get the prefix compression from the trie and the 
> sorted terms store.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-18673) Reduce size of per-SSTable index components

Reply via email to