[jira] [Comment Edited] (CASSANDRA-20250) Optimize Counter, Meter and Histogram metrics using thread local counters

Dmitry Konstantinov (Jira) Thu, 20 Feb 2025 06:41:17 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-20250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17928812#comment-17928812
 ]


Dmitry Konstantinov edited comment on CASSANDRA-20250 at 2/20/25 2:38 PM:
--------------------------------------------------------------------------

I have committed the recycling logic based on PhantomReferences. During the 
implementation I realized that it is not that easy to combine PhantomReference 
approach and the periodic iteration with thread state checking because to check 
the thread state I need a reference to it and it would prevent to GC Thread 
objects.  It looks possible to WA it by using WeakReferences or keeping thread 
id or trying to check the thread state using some other ways but I think the 
complexity does not worth it, so I removed the iteration logic.

I checked FastThreadLocal.onRemoval() and it works fine for 
FastThreadLocalThreads, I suppose all or almost all threads in Cassandra server 
are based on FastThreadLocalThread, so I integrated this logic as well. I 
total, it is combination of PhantomReference + FastThreadLocal now.

Additionally, I have replaced ConcurrentSkipListSet with BitSet for free metric 
ids set as it was suggested before, it should reduce memory footprint.

The remaining things for me to do before a review phase: add javadocs, improve 
test coverage and check correctness of concurrent logic in some scenarios.


was (Author: dnk):
I have committed the recycling logic based on PhantomReferences. During the 
implementation I realized that it is not that easy to combine PhantomReference 
approach and the periodic iteration with thread state checking because to check 
the thread state I need a reference to it and it would prevent to GC Thread 
objects.  It looks possible to WA it by using WeakReferences or keeping thread 
id or trying to check the thread state using some other ways but I think the 
complexity does not worth it, so I removed the iteration logic.

I checked FastThreadLocal and it works fine for FastThreadLocalThreads, I 
suppose all or almost all threads in Cassandra server are based on 
FastThreadLocalThread, so I integrated this logic as well. I total, it is 
combination of PhantomReference + FastThreadLocal now.

Additionally, I have replaced ConcurrentSkipListSet with BitSet for free metric 
ids set as it was suggested before, it should reduce memory footprint.

The remaining things for me to do before a review phase: add javadocs, improve 
test coverage and check correctness of concurrent logic in some scenarios.

> Optimize Counter, Meter and Histogram metrics using thread local counters
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-20250
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20250
>             Project: Apache Cassandra
>          Issue Type: New Feature
>          Components: Observability/Metrics
>            Reporter: Dmitry Konstantinov
>            Assignee: Dmitry Konstantinov
>            Priority: Normal
>             Fix For: 5.x
>
>         Attachments: 5.1_profile_cpu.html, 
> 5.1_profile_cpu_without_metrics.html, 5.1_tl4_profile_cpu.html, 
> Histogram_AtomicLong.png, async_profiler_cpu_profiles.zip, 
> cpu_profile_insert.html, image-2025-02-18-23-22-19-983.png, jmh-result.json, 
> vmstat.log, vmstat_without_metrics.log
>
>
> Cassandra has a lot of metrics collected, many of them are collected per 
> table, so their instance number is multiplied by number of tables. From one 
> side it gives a better observability, from another side metrics are not for 
> free, there is an overhead associated with them:
> 1) CPU overhead: in case of simple CPU bound load: I already see like 5.5% of 
> total CPU spent for metrics in cpu framegraphs for read load and 11% for 
> write load. 
> Example: [^cpu_profile_insert.html] (search by "codahale" pattern). The 
> framegraph is captured using Async profiler build: 
> async-profiler-3.0-29ee888-linux-x64
> 2) memory overhead: we spend memory for entities used to aggregate metrics 
> such as LongAdders and reservoirs + for MBeans (String concatenation within 
> object names is a major cause of it, for each table+metric name combination a 
> new String is created)
> LongAdder is used by Dropwizard Counter/Meter and Histogram metrics for 
> counting purposes. It has severe memory overhead + while has a better scaling 
> than AtomicLong we still have to pay some cost for the concurrent operations. 
> Additionally, in case of Meter - we have a non-optimal behaviour when we 
> count the same things several times.
> The idea (suggested by [~benedict]) is to switch to thread-local counters 
> which we can store in a common thread-local array to reduce memory overhead. 
> In this way we can avoid concurrent update overheads/contentions and to 
> reduce memory footprint as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-20250) Optimize Counter, Meter and Histogram metrics using thread local counters

Reply via email to