Hello, Igniters!

For Ignite diagnostic usually it’s helpful to get some Ignite internals
information. But currently, in my opinion, there are no convenient tools
for this purpose:

·        Some issues can be solved by analyzing log files. Log files are
useful for dumps, but sometimes they are difficult to read. Also
interesting metrics can’t be received runtime by request, we need to wait
until Ignite will write these metrics by timeout or other events.

·        JMX is useful for scalar metrics. Complex and table data can also
be received, but it’s difficult to read, filter and sort them without
processing by specialized external tools. For most frequently used cases
almost duplicating metrics are created to show data in an easy-to-read form.

·        Web-console is able to show table and complex data. Perhaps,
someday  web-console will contain all necessary dashboards for most problem
investigation, but some non-trivial queries will not be covered anyway.
Also web-console needs additional infrastructure to work.

·        External “home-made” tools can be used for non-trivial cases. They
cover highly specialized cases and usually can’t be used as general purpose
tools.

Sometimes we are forced to use more than one tool and join data by hands
(for example, current thread dump and data from logs).

Often RDBMS for diagnostic purposes provides system views (for example,
DBA_% and V$% in Oracle), which can be queried by SQL. This solution makes
all internal diagnostic information available in a readable form (with all
possible filters and projections) without using any other internal or
external tools. My proposal is to create similar system views in Ignite.

I implement working prototype (PR: [1]). It contains views:

IGNITE_SYSTEM_VIEWS

Registered system views

IGNITE_INSTANCE

Ignite instance

IGNITE_JVM_THREADS

JVM threads

IGNITE_JVM_RUNTIME

JVM runtime

IGNITE_JVM_OS

JVM operating system

IGNITE_CACHES

Ignite caches

IGNITE_CACHE_CLUSTER_METRICS

Ignite cache cluster metrics

IGNITE_CACHE_NODE_METRICS

Ignite cache node metrics

IGNITE_CACHE_GROUPS

Cache groups

IGNITE_NODES

Nodes in topology

IGNITE_NODE_HOSTS

Node hosts

IGNITE_NODE_ADDRESSES

Node addresses

IGNITE_NODE_ATTRIBUTES

Node attributes

IGNITE_NODE_METRICS

Node metrics

IGNITE_TRANSACTIONS

Active transactions

IGNITE_TRANSACTION_ENTRIES

Cache entries used by transaction

IGNITE_TASKS

Active tasks

IGNITE_PART_ASSIGNMENT

Partition assignment map

IGNITE_PART_ALLOCATION

Partition allocation map



There are much more useful views can be implemented (executors diagnostic,
SPIs diagnostic, etc).

Some usage examples:

Cache groups and their partitions, which used by transaction more than 5
minutes long:

SELECT cg.CACHE_OR_GROUP_NAME, te.KEY_PARTITION, count(*) AS ENTITIES_CNT
FROM INFORMATION_SCHEMA.IGNITE_TRANSACTIONS t
JOIN INFORMATION_SCHEMA.IGNITE_TRANSACTION_ENTRIES te ON t.XID = te.XID
JOIN INFORMATION_SCHEMA.IGNITE_CACHES c ON te.CACHE_NAME = c.NAME
JOIN INFORMATION_SCHEMA.IGNITE_CACHE_GROUPS cg ON c.GROUP_ID = cg.ID
WHERE t.START_TIME < TIMESTAMPADD('MINUTE', -5, NOW())
GROUP BY cg.CACHE_OR_GROUP_NAME, te.KEY_PARTITION



Average CPU load on server nodes grouped by operating system:

SELECT na.VALUE, COUNT(n.ID), AVG(nm.AVG_CPU_LOAD) AVG_CPU_LOAD
FROM INFORMATION_SCHEMA.IGNITE_NODES n
JOIN INFORMATION_SCHEMA.IGNITE_NODE_ATTRIBUTES na ON na.NODE_ID = n.ID AND
na.NAME = 'os.name'
JOIN INFORMATION_SCHEMA.IGNITE_NODE_METRICS nm ON nm.NODE_ID = n.ID
WHERE n.IS_CLIENT = false
GROUP BY na.VALUE



Top 5 nodes by puts to cache ‘cache’:

SELECT cm.NODE_ID, cm.CACHE_PUTS FROM
INFORMATION_SCHEMA.IGNITE_CACHE_NODE_METRICS cm
WHERE cm.CACHE_NAME = 'cache'
ORDER BY cm.CACHE_PUTS DESC
LIMIT 5



Does this implementation interesting to someone else? Maybe any views are
redundant? Which additional first-priority views must be implemented? Any
other thoughts or proposal?

[1] https://github.com/apache/ignite/pull/3413

Reply via email to