[ 
https://issues.apache.org/jira/browse/CASSANDRA-20465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Konstantinov updated CASSANDRA-20465:
--------------------------------------------
    Description: 
Before TCM changes org.apache.cassandra.schema.TableMetadataRef#get invocation 
was cheap (it just returned a field value), now it does a lookup from Schema 
every time with a search a BTree + nota very cheap check is it a system 
keyspace.

Diff:
!image-2025-03-20-22-43-09-044.png|width=300!

We have several places in code which uses TableMetadataRef#get and assume a low 
cost for it.
Currently we have about 0.93% of CPU spent for this operation in total. If we 
check percentage for (compaction + flush) threads - it is 5.4% and 9.4% for 
compaction only ( [^5.1_cpu.html] ).

Not sure if it is easy to reduce overheads in TableMetadataRef#get itself but 
we also can avoid them in many cases by a small adjustment of a logic on an 
invoker side to avoid too frequent usage of TableMetadataRef#get:

1) org.apache.cassandra.db.ColumnFamilyStore#isRowCacheEnabled - by default row 
cache is fully disabled - probably it is better to check if it is enabled as a 
first condition:
!image-2025-03-20-22-46-34-571.png|width=300!

2) org.apache.cassandra.db.memtable.TrieMemtable#getFlushSet - we can lookup 
metadata once at the beginning of getFlushSet logic

!image-2025-03-20-22-52-40-818.png|width=300! 
!image-2025-03-20-22-53-25-001.png|width=300!

3) org.apache.cassandra.io.sstable.SSTableIdentityIterator.create - to think if 
we can retrieve TableMetadata at the beginning a compaction and use during it..

!image-2025-03-20-22-56-31-298.png|width=300!

4) org.apache.cassandra.io.sstable.keycache.KeyCacheSupport.getCacheKey - to 
think if we can retrieve only needed id/indexName fields once (at leas t and id 
does not look like a dynamically changed parameter ..)
!image-2025-03-20-22-58-00-837.png|width=300!

  was:
Before TCM changes org.apache.cassandra.schema.TableMetadataRef#get invocation 
was cheap (it just returned a field value), now it does a lookup from Schema 
every time with a search a BTree + nota very cheap check is it a system 
keyspace.

Diff:
!image-2025-03-20-22-43-09-044.png|width=300!

We have several places in code which uses TableMetadataRef#get and assume a low 
cost for it.
Currently we have about 0.93% of CPU spent for this operation in total. If we 
check percentage for (compaction + flush) threads - it is 5.4% and 9.4% for 
compaction only ( [^5.1_cpu.html] ).

Not sure if it is easy to reduce overheads in TableMetadataRef#get itself but 
we also can avoid them in many cases by a small adjustment of a logic on an 
invoker side to avoid too frequent usage of TableMetadataRef#get:

1) org.apache.cassandra.db.ColumnFamilyStore#isRowCacheEnabled - by default row 
cache is fully disabled - probably it is better to check if it is enabled as a 
first condition:
!image-2025-03-20-22-46-34-571.png|width=300!

2) org.apache.cassandra.db.memtable.TrieMemtable#getFlushSet - we can lookup 
metadata once at the beginning of getFlushSet logic

!image-2025-03-20-22-52-40-818.png|width=300! 
!image-2025-03-20-22-53-25-001.png|width=300!

3) org.apache.cassandra.io.sstable.SSTableIdentityIterator.create - to think if 
we can keep TableMetadata during a compaction..
!image-2025-03-20-22-56-31-298.png|width=300!

4) org.apache.cassandra.io.sstable.keycache.KeyCacheSupport.getCacheKey - to 
think if we can retrieve only needed id/indexName fields once ..
!image-2025-03-20-22-58-00-837.png|width=300!


> Reduce runtime overhead of org.apache.cassandra.schema.TableMetadataRef#get 
> usage 
> ----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-20465
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20465
>             Project: Apache Cassandra
>          Issue Type: Improvement
>          Components: Cluster/Schema, Transactional Cluster Metadata
>            Reporter: Dmitry Konstantinov
>            Assignee: Dmitry Konstantinov
>            Priority: Normal
>             Fix For: 5.x
>
>         Attachments: 5.1_cpu.html, image-2025-03-20-22-43-09-044.png, 
> image-2025-03-20-22-46-34-571.png, image-2025-03-20-22-52-40-818.png, 
> image-2025-03-20-22-53-25-001.png, image-2025-03-20-22-56-31-298.png, 
> image-2025-03-20-22-58-00-837.png
>
>
> Before TCM changes org.apache.cassandra.schema.TableMetadataRef#get 
> invocation was cheap (it just returned a field value), now it does a lookup 
> from Schema every time with a search a BTree + nota very cheap check is it a 
> system keyspace.
> Diff:
> !image-2025-03-20-22-43-09-044.png|width=300!
> We have several places in code which uses TableMetadataRef#get and assume a 
> low cost for it.
> Currently we have about 0.93% of CPU spent for this operation in total. If we 
> check percentage for (compaction + flush) threads - it is 5.4% and 9.4% for 
> compaction only ( [^5.1_cpu.html] ).
> Not sure if it is easy to reduce overheads in TableMetadataRef#get itself but 
> we also can avoid them in many cases by a small adjustment of a logic on an 
> invoker side to avoid too frequent usage of TableMetadataRef#get:
> 1) org.apache.cassandra.db.ColumnFamilyStore#isRowCacheEnabled - by default 
> row cache is fully disabled - probably it is better to check if it is enabled 
> as a first condition:
> !image-2025-03-20-22-46-34-571.png|width=300!
> 2) org.apache.cassandra.db.memtable.TrieMemtable#getFlushSet - we can lookup 
> metadata once at the beginning of getFlushSet logic
> !image-2025-03-20-22-52-40-818.png|width=300! 
> !image-2025-03-20-22-53-25-001.png|width=300!
> 3) org.apache.cassandra.io.sstable.SSTableIdentityIterator.create - to think 
> if we can retrieve TableMetadata at the beginning a compaction and use during 
> it..
> !image-2025-03-20-22-56-31-298.png|width=300!
> 4) org.apache.cassandra.io.sstable.keycache.KeyCacheSupport.getCacheKey - to 
> think if we can retrieve only needed id/indexName fields once (at leas t and 
> id does not look like a dynamically changed parameter ..)
> !image-2025-03-20-22-58-00-837.png|width=300!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to