[
https://issues.apache.org/jira/browse/IGNITE-27000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18044689#comment-18044689
]
Viacheslav Blinov commented on IGNITE-27000:
--------------------------------------------
It seems that this task asks for many metrics that would be scattered all over
the place. If we would try to implement all of them at once, that will result
in a huge PR. I propose to split this task into a set of tasks that would group
them by functional scope and make more sense in terms of testing/review:
Checkpoint metrics:
- Time from requesting checkpoint read lock until acquisition - distribution
- Time between read lock acquisition and release - distribution
- Contention - Number of times thread had to wait for read lock
- total number of successful lock acquisitions
IO metrics:
- Cumulative bytes read from disk
- Cumulative bytes written to disk
- Time spent in physical disk read operations
- Time spent in physical disk write operations
- Failed read operations
- Failed write operations
- Distribution of read operation sizes in bytes
- Distribution of write operation sizes in bytes
Page memory metrics:
- Number of pages read from disk
- Number of pages written to disk
- Page cache hits (total, per last 60 seconds)
- Page cache misses (total, per last 60 seconds)
- Distribution of page acquisition time
- Page cache evictions count
- Current number of dirty pages
Storage consistency mertics:
- Total number of runConsistently invocations
- Time spent in runConsistently closures - distribution
- Number of I/O operations per runConsistently closure call
- Number of runConsistently calls active right now - gauge
Storage file descritor metrics:
- Current number of open file handles - gauge
- Optionally:
- Current number of delta files - gauge
- Total size of all delta files - gauge
- Total size of all data files - gauge
> Add new "aipersist" engine metrics
> ----------------------------------
>
> Key: IGNITE-27000
> URL: https://issues.apache.org/jira/browse/IGNITE-27000
> Project: Ignite
> Issue Type: Improvement
> Components: storage engines ai3
> Reporter: Ivan Bessonov
> Assignee: Viacheslav Blinov
> Priority: Major
> Labels: ignite-3
> Time Spent: 10m
> Remaining Estimate: 0h
>
> We need some metrics that would show us how engine performs, for example when
> it comes to duration of certain operations. This includes:
> * Time it takes for checkpoint read lock to be acquired.
> A histogram would be nice, maybe something else on top.
> * Time between checkpoint read lock acquiring and releasing.
> Same approach.
> * How many bytes has the engine read or written since the very beginning.
> With no categories, just a total.
> * Duration of a single read [page] operation. Same approach.
> (write is partially covered by checkpointer and compactor)
> * Anything that comes to mind of a developer as useful. Optional, might be
> done separately. Examples:
> ** Number of opened files.
> ** Duration of "open file" or "create file" operations.
> ** IO calls per "runConsistently" closure.
> ** Page memory hit ratio / other page replacement metrics.
> ** etc.
> Enable "aipersist" metrics source by default, it's not that big but very
> useful.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)