Denis Chudov created IGNITE-12070:
-------------------------------------
Summary: Document the new ability to track system/user time of
transactions
Key: IGNITE-12070
URL: https://issues.apache.org/jira/browse/IGNITE-12070
Project: Ignite
Issue Type: Task
Components: documentation
Reporter: Denis Chudov
Now there is ability to track system/user time of transactions. System time is
the time that is spent for system activities - i.e. time while aquiring locks,
preparing, commiting, etc.User time is the time that is spent for user
activities when client node runs some code while holding transaction.
We have ability to log info about transactions that exceed some threshold
execution timeout, or some percentage of all transactions. Log record in case
of long-running transactions looks like following:
{code:java}
[2019-08-09 13:39:49,130][WARN ][sys-stripe-1-#101%client%][root] Long
transaction time dump [startTime=13:39:47.970, totalTime=1160, systemTime=157,
userTime=1003, cacheOperationsTime=141, prepareTime=15, commitTime=0,
tx=GridNearTxLocal [...]]
{code}
In case of sampling of all transactions:
{code:java}
[2019-08-09 13:39:54,079][INFO ][sys-stripe-2-#102%client%][root] Transaction
time dump [startTime=13:39:54.063, totalTime=15, systemTime=6, userTime=9,
cacheOperationsTime=2, prepareTime=3, commitTime=0, tx=GridNearTxLocal [...]]
{code}
Also some of transactions can be skipped to not overflow the log, information
about this log throttling looks like this:
{code:java}
[2019-08-09 13:39:55,109][INFO ][sys-stripe-0-#100%client%][root] Transaction
time dumps skipped because of log throttling: 2
{code}
There are JMX parameters and JVM options to control this behavior:
1)
JVM option: IGNITE_LONG_TRANSACTION_TIME_DUMP_THRESHOLD
JMX parameter: TransactionsMXBean.longTransactionTimeDumpThreshold
Threshold timeout in milliseconds for long transactions, if transaction exceeds
it, it will be dumped in log with information about how much time did it spent
in system time and user time. Default value is 0. No info about system/user
time of long transactions is dumped in log if this parameter is not set.
2)
JVM option: IGNITE_TRANSACTION_TIME_DUMP_SAMPLES_COEFFICIENT
JMX parameter: TransactionsMXBean.transactionTimeDumpSamplesCoefficient
The coefficient for samples of completed transactions that will be dumped in
log. Must be float value between 0.0 and 1.0 inclusive. Default value is 0.0.
3)
JVM option: IGNITE_TRANSACTION_TIME_DUMP_SAMPLES_PER_SECOND_LIMIT
JMX parameter: TransactionsMXBean.transactionTimeDumpSamplesPerSecondLimit
The limit of samples of completed transactions that will be dumped in log per
second, if IGNITE_TRANSACTION_TIME_DUMP_SAMPLES_COEFFICIENT is above 0.0. Must
be integer value greater than 0. Default value is 5.
For the existing long running transaction warning was added information about
current system and user time of transaction:
{code:java}
[2019-08-09 14:10:31,835][WARN ][grid-timeout-worker-#122%client%][root] First
10 long running transactions [total=1]
[2019-08-09 14:10:31,835][WARN ][grid-timeout-worker-#122%client%][root] >>>
Transaction [startTime=14:10:31.170, curTime=14:10:31.750, systemTime=32,
userTime=548, tx=GridNearTxLocal [...]]
{code}
Also added following metrics to monitor system and user time for single node:
diagnostic.transactions.totalNodeSystemTime - Total transactions system time on
node.
diagnostic.transactions.totalNodeUserTime - Total transactions user time on
node.
diagnostic.transactions.nodeSystemTimeHistogram - Transactions system times on
node represented as histogram.
diagnostic.transactions.nodeUserTimeHistogram - Transactions user times on node
represented as histogram.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)