Hello, Igniters! For Ignite diagnostic usually it’s helpful to get some Ignite internals information. But currently, in my opinion, there are no convenient tools for this purpose:
· Some issues can be solved by analyzing log files. Log files are useful for dumps, but sometimes they are difficult to read. Also interesting metrics can’t be received runtime by request, we need to wait until Ignite will write these metrics by timeout or other events. · JMX is useful for scalar metrics. Complex and table data can also be received, but it’s difficult to read, filter and sort them without processing by specialized external tools. For most frequently used cases almost duplicating metrics are created to show data in an easy-to-read form. · Web-console is able to show table and complex data. Perhaps, someday web-console will contain all necessary dashboards for most problem investigation, but some non-trivial queries will not be covered anyway. Also web-console needs additional infrastructure to work. · External “home-made” tools can be used for non-trivial cases. They cover highly specialized cases and usually can’t be used as general purpose tools. Sometimes we are forced to use more than one tool and join data by hands (for example, current thread dump and data from logs). Often RDBMS for diagnostic purposes provides system views (for example, DBA_% and V$% in Oracle), which can be queried by SQL. This solution makes all internal diagnostic information available in a readable form (with all possible filters and projections) without using any other internal or external tools. My proposal is to create similar system views in Ignite. I implement working prototype (PR: [1]). It contains views: IGNITE_SYSTEM_VIEWS Registered system views IGNITE_INSTANCE Ignite instance IGNITE_JVM_THREADS JVM threads IGNITE_JVM_RUNTIME JVM runtime IGNITE_JVM_OS JVM operating system IGNITE_CACHES Ignite caches IGNITE_CACHE_CLUSTER_METRICS Ignite cache cluster metrics IGNITE_CACHE_NODE_METRICS Ignite cache node metrics IGNITE_CACHE_GROUPS Cache groups IGNITE_NODES Nodes in topology IGNITE_NODE_HOSTS Node hosts IGNITE_NODE_ADDRESSES Node addresses IGNITE_NODE_ATTRIBUTES Node attributes IGNITE_NODE_METRICS Node metrics IGNITE_TRANSACTIONS Active transactions IGNITE_TRANSACTION_ENTRIES Cache entries used by transaction IGNITE_TASKS Active tasks IGNITE_PART_ASSIGNMENT Partition assignment map IGNITE_PART_ALLOCATION Partition allocation map There are much more useful views can be implemented (executors diagnostic, SPIs diagnostic, etc). Some usage examples: Cache groups and their partitions, which used by transaction more than 5 minutes long: SELECT cg.CACHE_OR_GROUP_NAME, te.KEY_PARTITION, count(*) AS ENTITIES_CNT FROM INFORMATION_SCHEMA.IGNITE_TRANSACTIONS t JOIN INFORMATION_SCHEMA.IGNITE_TRANSACTION_ENTRIES te ON t.XID = te.XID JOIN INFORMATION_SCHEMA.IGNITE_CACHES c ON te.CACHE_NAME = c.NAME JOIN INFORMATION_SCHEMA.IGNITE_CACHE_GROUPS cg ON c.GROUP_ID = cg.ID WHERE t.START_TIME < TIMESTAMPADD('MINUTE', -5, NOW()) GROUP BY cg.CACHE_OR_GROUP_NAME, te.KEY_PARTITION Average CPU load on server nodes grouped by operating system: SELECT na.VALUE, COUNT(n.ID), AVG(nm.AVG_CPU_LOAD) AVG_CPU_LOAD FROM INFORMATION_SCHEMA.IGNITE_NODES n JOIN INFORMATION_SCHEMA.IGNITE_NODE_ATTRIBUTES na ON na.NODE_ID = n.ID AND na.NAME = 'os.name' JOIN INFORMATION_SCHEMA.IGNITE_NODE_METRICS nm ON nm.NODE_ID = n.ID WHERE n.IS_CLIENT = false GROUP BY na.VALUE Top 5 nodes by puts to cache ‘cache’: SELECT cm.NODE_ID, cm.CACHE_PUTS FROM INFORMATION_SCHEMA.IGNITE_CACHE_NODE_METRICS cm WHERE cm.CACHE_NAME = 'cache' ORDER BY cm.CACHE_PUTS DESC LIMIT 5 Does this implementation interesting to someone else? Maybe any views are redundant? Which additional first-priority views must be implemented? Any other thoughts or proposal? [1] https://github.com/apache/ignite/pull/3413