Re: Ignite diagnostic (SQL system views)

Vladimir Ozerov Wed, 24 Jan 2018 02:04:14 -0800

Hi Alex,

System views could be extremely valuable addition for Ignite. Ideally, user
should be able to monitor and manage state of the whole cluster with a
single SQL command line. We have plans to implement it for a very long
time. However, this is very sensitive task which should take a lot of
moving pieces in count, such as usability, consistency, performance,
security, etc..

Let me point several major concerns I see at the moment:

1) Usability: INFORMATION_SCHEMA
This schema is part of SQL ANSI standard. When creating system views, some
vendors prefer to store them in completely different predefined schema
(Oracle, MS SQL). Others prefer to keep them in INFORMATION_SCHEMA
directly. Both approaches could work. However, the latter breaks separation
of concerns - we store typical metadata near to possibly sensitive system
data. Also it makes security management more complex - system data is very
sensitive, and now we cannot simply grant access INFORMATIONAL_SCHEMA to
user. Instead, we have to grant that access on per-view basis. For this
reason my preference is to store system tables in separate schema, not in
INFORMATION_SCHEMA

2) Consistency: local data
One of implemented view GridH2SysViewImplInstance. Normally SQL users
communicate with Ignite through JDBC/ODBC drivers. These drivers are
connected to a single node, typically client node. Moreover, we will
introduce high-availability feature when drivers were able to connect to
any address from a predefined list. It renders this view useless, as you do
not know which node you connected to. Also, local-only data cannot be
joined in general case - you will receive different results on different
nodes. The same goes for transactions, JVM info, etc.

3) Performance
Suppose we fixed consistency of transactions and now this view shows
transactions in the whole cluster with possibility to filter them by nodes
- this is what user would expect out of the box. Another problem appears
then - performance. How would we collect necessary data? How would we
handle joins, when particular view could be scanned multiple times during
query execution? How we achieve sensible consistency? Most probably we
would collect remote data once when query is started, cache it somehow on
query session level, and then re-use during joins. But again, this should
be discussed separately.

4) Security: JVM info
We should define clear boundaries of what info is exposed. JVM data along
with running threads is critically sensitive information. We should not
expose it until we have authorization capabilities.

In order to start moving this code from prototype to production state we
should start with the most simple and consistent views. E.g. IGNITE_CACHES.
Let's move it to a separate PR, review infrastructure code, review view
implementation, agree on proper naming and placement, and merge it. Then
each and every view (or group of related views) should be discussed and
reviewed separately.

As far as node-local stuff, may be we should move it to a separate schema,
or mark with special prefix. E.g. "IGNITE.TRANSACTIONS" - all transactions
in the cluster, "IGNITE.LOCAL_TRANSACTIONS" - transactions on the local
node. In this case we will be able to merge "local" stuff shortly, and
implement more complex but at the same time much more useful distributed
stuff later on.

Makes sense?

Vladimir.

On Tue, Jan 23, 2018 at 8:30 PM, Alex Plehanov <plehanov.a...@gmail.com>
wrote:

> Hello, Igniters!
>
> For Ignite diagnostic usually it’s helpful to get some Ignite internals
> information. But currently, in my opinion, there are no convenient tools
> for this purpose:
>
> ·        Some issues can be solved by analyzing log files. Log files are
> useful for dumps, but sometimes they are difficult to read. Also
> interesting metrics can’t be received runtime by request, we need to wait
> until Ignite will write these metrics by timeout or other events.
>
> ·        JMX is useful for scalar metrics. Complex and table data can also
> be received, but it’s difficult to read, filter and sort them without
> processing by specialized external tools. For most frequently used cases
> almost duplicating metrics are created to show data in an easy-to-read
> form.
>
> ·        Web-console is able to show table and complex data. Perhaps,
> someday  web-console will contain all necessary dashboards for most problem
> investigation, but some non-trivial queries will not be covered anyway.
> Also web-console needs additional infrastructure to work.
>
> ·        External “home-made” tools can be used for non-trivial cases. They
> cover highly specialized cases and usually can’t be used as general purpose
> tools.
>
> Sometimes we are forced to use more than one tool and join data by hands
> (for example, current thread dump and data from logs).
>
> Often RDBMS for diagnostic purposes provides system views (for example,
> DBA_% and V$% in Oracle), which can be queried by SQL. This solution makes
> all internal diagnostic information available in a readable form (with all
> possible filters and projections) without using any other internal or
> external tools. My proposal is to create similar system views in Ignite.
>
> I implement working prototype (PR: [1]). It contains views:
>
> IGNITE_SYSTEM_VIEWS
>
> Registered system views
>
> IGNITE_INSTANCE
>
> Ignite instance
>
> IGNITE_JVM_THREADS
>
> JVM threads
>
> IGNITE_JVM_RUNTIME
>
> JVM runtime
>
> IGNITE_JVM_OS
>
> JVM operating system
>
> IGNITE_CACHES
>
> Ignite caches
>
> IGNITE_CACHE_CLUSTER_METRICS
>
> Ignite cache cluster metrics
>
> IGNITE_CACHE_NODE_METRICS
>
> Ignite cache node metrics
>
> IGNITE_CACHE_GROUPS
>
> Cache groups
>
> IGNITE_NODES
>
> Nodes in topology
>
> IGNITE_NODE_HOSTS
>
> Node hosts
>
> IGNITE_NODE_ADDRESSES
>
> Node addresses
>
> IGNITE_NODE_ATTRIBUTES
>
> Node attributes
>
> IGNITE_NODE_METRICS
>
> Node metrics
>
> IGNITE_TRANSACTIONS
>
> Active transactions
>
> IGNITE_TRANSACTION_ENTRIES
>
> Cache entries used by transaction
>
> IGNITE_TASKS
>
> Active tasks
>
> IGNITE_PART_ASSIGNMENT
>
> Partition assignment map
>
> IGNITE_PART_ALLOCATION
>
> Partition allocation map
>
>
>
> There are much more useful views can be implemented (executors diagnostic,
> SPIs diagnostic, etc).
>
> Some usage examples:
>
> Cache groups and their partitions, which used by transaction more than 5
> minutes long:
>
> SELECT cg.CACHE_OR_GROUP_NAME, te.KEY_PARTITION, count(*) AS ENTITIES_CNT
> FROM INFORMATION_SCHEMA.IGNITE_TRANSACTIONS t
> JOIN INFORMATION_SCHEMA.IGNITE_TRANSACTION_ENTRIES te ON t.XID = te.XID
> JOIN INFORMATION_SCHEMA.IGNITE_CACHES c ON te.CACHE_NAME = c.NAME
> JOIN INFORMATION_SCHEMA.IGNITE_CACHE_GROUPS cg ON c.GROUP_ID = cg.ID
> WHERE t.START_TIME < TIMESTAMPADD('MINUTE', -5, NOW())
> GROUP BY cg.CACHE_OR_GROUP_NAME, te.KEY_PARTITION
>
>
>
> Average CPU load on server nodes grouped by operating system:
>
> SELECT na.VALUE, COUNT(n.ID), AVG(nm.AVG_CPU_LOAD) AVG_CPU_LOAD
> FROM INFORMATION_SCHEMA.IGNITE_NODES n
> JOIN INFORMATION_SCHEMA.IGNITE_NODE_ATTRIBUTES na ON na.NODE_ID = n.ID AND
> na.NAME = 'os.name'
> JOIN INFORMATION_SCHEMA.IGNITE_NODE_METRICS nm ON nm.NODE_ID = n.ID
> WHERE n.IS_CLIENT = false
> GROUP BY na.VALUE
>
>
>
> Top 5 nodes by puts to cache ‘cache’:
>
> SELECT cm.NODE_ID, cm.CACHE_PUTS FROM
> INFORMATION_SCHEMA.IGNITE_CACHE_NODE_METRICS cm
> WHERE cm.CACHE_NAME = 'cache'
> ORDER BY cm.CACHE_PUTS DESC
> LIMIT 5
>
>
>
> Does this implementation interesting to someone else? Maybe any views are
> redundant? Which additional first-priority views must be implemented? Any
> other thoughts or proposal?
>
> [1] https://github.com/apache/ignite/pull/3413
>

Re: Ignite diagnostic (SQL system views)

Reply via email to