Hello, Igniters!
For Ignite diagnostic usually it’s helpful to get some Ignite internals
information. But currently, in my opinion, there are no convenient tools
for this purpose:
· Some issues can be solved by analyzing log files. Log files are
useful for dumps, but sometimes they are difficult to read. Also
interesting metrics can’t be received runtime by request, we need to wait
until Ignite will write these metrics by timeout or other events.
· JMX is useful for scalar metrics. Complex and table data can also
be received, but it’s difficult to read, filter and sort them without
processing by specialized external tools. For most frequently used cases
almost duplicating metrics are created to show data in an easy-to-read form.
· Web-console is able to show table and complex data. Perhaps,
someday web-console will contain all necessary dashboards for most problem
investigation, but some non-trivial queries will not be covered anyway.
Also web-console needs additional infrastructure to work.
· External “home-made” tools can be used for non-trivial cases. They
cover highly specialized cases and usually can’t be used as general purpose
tools.
Sometimes we are forced to use more than one tool and join data by hands
(for example, current thread dump and data from logs).
Often RDBMS for diagnostic purposes provides system views (for example,
DBA_% and V$% in Oracle), which can be queried by SQL. This solution makes
all internal diagnostic information available in a readable form (with all
possible filters and projections) without using any other internal or
external tools. My proposal is to create similar system views in Ignite.
I implement working prototype (PR: [1]). It contains views:
IGNITE_SYSTEM_VIEWS
Registered system views
IGNITE_INSTANCE
Ignite instance
IGNITE_JVM_THREADS
JVM threads
IGNITE_JVM_RUNTIME
JVM runtime
IGNITE_JVM_OS
JVM operating system
IGNITE_CACHES
Ignite caches
IGNITE_CACHE_CLUSTER_METRICS
Ignite cache cluster metrics
IGNITE_CACHE_NODE_METRICS
Ignite cache node metrics
IGNITE_CACHE_GROUPS
Cache groups
IGNITE_NODES
Nodes in topology
IGNITE_NODE_HOSTS
Node hosts
IGNITE_NODE_ADDRESSES
Node addresses
IGNITE_NODE_ATTRIBUTES
Node attributes
IGNITE_NODE_METRICS
Node metrics
IGNITE_TRANSACTIONS
Active transactions
IGNITE_TRANSACTION_ENTRIES
Cache entries used by transaction
IGNITE_TASKS
Active tasks
IGNITE_PART_ASSIGNMENT
Partition assignment map
IGNITE_PART_ALLOCATION
Partition allocation map
There are much more useful views can be implemented (executors diagnostic,
SPIs diagnostic, etc).
Some usage examples:
Cache groups and their partitions, which used by transaction more than 5
minutes long:
SELECT cg.CACHE_OR_GROUP_NAME, te.KEY_PARTITION, count(*) AS ENTITIES_CNT
FROM INFORMATION_SCHEMA.IGNITE_TRANSACTIONS t
JOIN INFORMATION_SCHEMA.IGNITE_TRANSACTION_ENTRIES te ON t.XID = te.XID
JOIN INFORMATION_SCHEMA.IGNITE_CACHES c ON te.CACHE_NAME = c.NAME
JOIN INFORMATION_SCHEMA.IGNITE_CACHE_GROUPS cg ON c.GROUP_ID = cg.ID
WHERE t.START_TIME < TIMESTAMPADD('MINUTE', -5, NOW())
GROUP BY cg.CACHE_OR_GROUP_NAME, te.KEY_PARTITION
Average CPU load on server nodes grouped by operating system:
SELECT na.VALUE, COUNT(n.ID), AVG(nm.AVG_CPU_LOAD) AVG_CPU_LOAD
FROM INFORMATION_SCHEMA.IGNITE_NODES n
JOIN INFORMATION_SCHEMA.IGNITE_NODE_ATTRIBUTES na ON na.NODE_ID = n.ID AND
na.NAME = 'os.name'
JOIN INFORMATION_SCHEMA.IGNITE_NODE_METRICS nm ON nm.NODE_ID = n.ID
WHERE n.IS_CLIENT = false
GROUP BY na.VALUE
Top 5 nodes by puts to cache ‘cache’:
SELECT cm.NODE_ID, cm.CACHE_PUTS FROM
INFORMATION_SCHEMA.IGNITE_CACHE_NODE_METRICS cm
WHERE cm.CACHE_NAME = 'cache'
ORDER BY cm.CACHE_PUTS DESC
LIMIT 5
Does this implementation interesting to someone else? Maybe any views are
redundant? Which additional first-priority views must be implemented? Any
other thoughts or proposal?
[1] https://github.com/apache/ignite/pull/3413