Re: Ignite diagnostic (SQL system views)

Anton Vinogradov Wed, 24 Jan 2018 08:13:55 -0800

I've created IEP-13 [1] to cover all cases.
Feel free to create issues.

[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=75962769


On Wed, Jan 24, 2018 at 6:10 PM, Vladimir Ozerov <voze...@gridgain.com>
wrote:

> Let's start with a single and the most simple view, e.g.
> LOCAL_TRANSACTIONS. We will review and merge it along with necessary
> infrastructure. Then will handle the rest view in separate tickets and
> separate focused discussions.
>
> On Wed, Jan 24, 2018 at 5:29 PM, Alex Plehanov <plehanov.a...@gmail.com>
> wrote:
>
> > 1) It’s not a principal point, I can change schema. The
> INFORMATION_SCHEMA
> > was used because it’s already exists and usually used for metadata tables
> > and views. Your proposal is to use schema “IGNITE”, am I understand you
> > right? BTW, for now, we can’t query another (H2) meta tables from the
> > INFORMATION_SCHEMA, so, “Ignite system views” is only available views to
> > query from this schema.
> > 2) Exactly for this reason the IGNITE_INSTANCE view is useful: to
> determine
> > which node we are connected to.
> > 3) As the first phase, in my opinion, local views will be enough.
> > Performance and caching of distributed views should be discussed at next
> > phases, when distributed views implementation will be planned. In current
> > implementation I tried to use indexing for local views wherever it’s
> > possible.
> > 4) I don’t think, that JVM info is more critical information than, for
> > example, caches or nodes information. When authorization capabilities
> > planned to implement?
> >
> > About local data: yes, we can rename all currently implemented views for
> > the local node data as LOCAL_..., and create (someday) new whole cluster
> > views (which use distributed requests) without prefix or, for example,
> with
> > CLUSTER_ prefix. But some views can show all cluster information using
> only
> > local node data, without distributed requests (for example
> > IGNITE_NODE_METRICS, IGNITE_PART_ASSIGNMENT, IGNITE_PART_ALLOCATION,
> > IGNITE_NODES, etc). Are they local or cluster views in this concept?
> Which
> > prefix should be used? And what about caches? Are they local or cluster?
> On
> > local node we can see cluster wide caches (replicated and distributed)
> and
> > caches for current node only. Local caches list may differ from node to
> > node. Which prefix should be used for this view? And one more, there is
> no
> > sense for some views to make them cluster wide (for example
> > INGNITE_INSTANCE). Should we name it LOCAL_INSTANCE without creating
> > INSTANCE view?
> >
> > So, next steps: split PR, change schema name (IGNITE?), change view name
> > for caches (CACHES, LOCAL_CACHES?)
> >
> >
> > 2018-01-24 13:03 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>:
> >
> > > Hi Alex,
> > >
> > > System views could be extremely valuable addition for Ignite. Ideally,
> > user
> > > should be able to monitor and manage state of the whole cluster with a
> > > single SQL command line. We have plans to implement it for a very long
> > > time. However, this is very sensitive task which should take a lot of
> > > moving pieces in count, such as usability, consistency, performance,
> > > security, etc..
> > >
> > > Let me point several major concerns I see at the moment:
> > >
> > > 1) Usability: INFORMATION_SCHEMA
> > > This schema is part of SQL ANSI standard. When creating system views,
> > some
> > > vendors prefer to store them in completely different predefined schema
> > > (Oracle, MS SQL). Others prefer to keep them in INFORMATION_SCHEMA
> > > directly. Both approaches could work. However, the latter breaks
> > separation
> > > of concerns - we store typical metadata near to possibly sensitive
> system
> > > data. Also it makes security management more complex - system data is
> > very
> > > sensitive, and now we cannot simply grant access INFORMATIONAL_SCHEMA
> to
> > > user. Instead, we have to grant that access on per-view basis. For this
> > > reason my preference is to store system tables in separate schema, not
> in
> > > INFORMATION_SCHEMA
> > >
> > > 2) Consistency: local data
> > > One of implemented view GridH2SysViewImplInstance. Normally SQL users
> > > communicate with Ignite through JDBC/ODBC drivers. These drivers are
> > > connected to a single node, typically client node. Moreover, we will
> > > introduce high-availability feature when drivers were able to connect
> to
> > > any address from a predefined list. It renders this view useless, as
> you
> > do
> > > not know which node you connected to. Also, local-only data cannot be
> > > joined in general case - you will receive different results on
> different
> > > nodes. The same goes for transactions, JVM info, etc.
> > >
> > > 3) Performance
> > > Suppose we fixed consistency of transactions and now this view shows
> > > transactions in the whole cluster with possibility to filter them by
> > nodes
> > > - this is what user would expect out of the box. Another problem
> appears
> > > then - performance. How would we collect necessary data? How would we
> > > handle joins, when particular view could be scanned multiple times
> during
> > > query execution? How we achieve sensible consistency? Most probably we
> > > would collect remote data once when query is started, cache it somehow
> on
> > > query session level, and then re-use during joins. But again, this
> should
> > > be discussed separately.
> > >
> > > 4) Security: JVM info
> > > We should define clear boundaries of what info is exposed. JVM data
> along
> > > with running threads is critically sensitive information. We should not
> > > expose it until we have authorization capabilities.
> > >
> > > In order to start moving this code from prototype to production state
> we
> > > should start with the most simple and consistent views. E.g.
> > IGNITE_CACHES.
> > > Let's move it to a separate PR, review infrastructure code, review view
> > > implementation, agree on proper naming and placement, and merge it.
> Then
> > > each and every view (or group of related views) should be discussed and
> > > reviewed separately.
> > >
> > > As far as node-local stuff, may be we should move it to a separate
> > schema,
> > > or mark with special prefix. E.g. "IGNITE.TRANSACTIONS" - all
> > transactions
> > > in the cluster, "IGNITE.LOCAL_TRANSACTIONS" - transactions on the local
> > > node. In this case we will be able to merge "local" stuff shortly, and
> > > implement more complex but at the same time much more useful
> distributed
> > > stuff later on.
> > >
> > > Makes sense?
> > >
> > > Vladimir.
> > >
> > >
> > > On Tue, Jan 23, 2018 at 8:30 PM, Alex Plehanov <
> plehanov.a...@gmail.com>
> > > wrote:
> > >
> > > > Hello, Igniters!
> > > >
> > > > For Ignite diagnostic usually it’s helpful to get some Ignite
> internals
> > > > information. But currently, in my opinion, there are no convenient
> > tools
> > > > for this purpose:
> > > >
> > > > ·        Some issues can be solved by analyzing log files. Log files
> > are
> > > > useful for dumps, but sometimes they are difficult to read. Also
> > > > interesting metrics can’t be received runtime by request, we need to
> > wait
> > > > until Ignite will write these metrics by timeout or other events.
> > > >
> > > > ·        JMX is useful for scalar metrics. Complex and table data can
> > > also
> > > > be received, but it’s difficult to read, filter and sort them without
> > > > processing by specialized external tools. For most frequently used
> > cases
> > > > almost duplicating metrics are created to show data in an
> easy-to-read
> > > > form.
> > > >
> > > > ·        Web-console is able to show table and complex data. Perhaps,
> > > > someday  web-console will contain all necessary dashboards for most
> > > problem
> > > > investigation, but some non-trivial queries will not be covered
> anyway.
> > > > Also web-console needs additional infrastructure to work.
> > > >
> > > > ·        External “home-made” tools can be used for non-trivial
> cases.
> > > They
> > > > cover highly specialized cases and usually can’t be used as general
> > > purpose
> > > > tools.
> > > >
> > > > Sometimes we are forced to use more than one tool and join data by
> > hands
> > > > (for example, current thread dump and data from logs).
> > > >
> > > > Often RDBMS for diagnostic purposes provides system views (for
> example,
> > > > DBA_% and V$% in Oracle), which can be queried by SQL. This solution
> > > makes
> > > > all internal diagnostic information available in a readable form
> (with
> > > all
> > > > possible filters and projections) without using any other internal or
> > > > external tools. My proposal is to create similar system views in
> > Ignite.
> > > >
> > > > I implement working prototype (PR: [1]). It contains views:
> > > >
> > > > IGNITE_SYSTEM_VIEWS
> > > >
> > > > Registered system views
> > > >
> > > > IGNITE_INSTANCE
> > > >
> > > > Ignite instance
> > > >
> > > > IGNITE_JVM_THREADS
> > > >
> > > > JVM threads
> > > >
> > > > IGNITE_JVM_RUNTIME
> > > >
> > > > JVM runtime
> > > >
> > > > IGNITE_JVM_OS
> > > >
> > > > JVM operating system
> > > >
> > > > IGNITE_CACHES
> > > >
> > > > Ignite caches
> > > >
> > > > IGNITE_CACHE_CLUSTER_METRICS
> > > >
> > > > Ignite cache cluster metrics
> > > >
> > > > IGNITE_CACHE_NODE_METRICS
> > > >
> > > > Ignite cache node metrics
> > > >
> > > > IGNITE_CACHE_GROUPS
> > > >
> > > > Cache groups
> > > >
> > > > IGNITE_NODES
> > > >
> > > > Nodes in topology
> > > >
> > > > IGNITE_NODE_HOSTS
> > > >
> > > > Node hosts
> > > >
> > > > IGNITE_NODE_ADDRESSES
> > > >
> > > > Node addresses
> > > >
> > > > IGNITE_NODE_ATTRIBUTES
> > > >
> > > > Node attributes
> > > >
> > > > IGNITE_NODE_METRICS
> > > >
> > > > Node metrics
> > > >
> > > > IGNITE_TRANSACTIONS
> > > >
> > > > Active transactions
> > > >
> > > > IGNITE_TRANSACTION_ENTRIES
> > > >
> > > > Cache entries used by transaction
> > > >
> > > > IGNITE_TASKS
> > > >
> > > > Active tasks
> > > >
> > > > IGNITE_PART_ASSIGNMENT
> > > >
> > > > Partition assignment map
> > > >
> > > > IGNITE_PART_ALLOCATION
> > > >
> > > > Partition allocation map
> > > >
> > > >
> > > >
> > > > There are much more useful views can be implemented (executors
> > > diagnostic,
> > > > SPIs diagnostic, etc).
> > > >
> > > > Some usage examples:
> > > >
> > > > Cache groups and their partitions, which used by transaction more
> than
> > 5
> > > > minutes long:
> > > >
> > > > SELECT cg.CACHE_OR_GROUP_NAME, te.KEY_PARTITION, count(*) AS
> > ENTITIES_CNT
> > > > FROM INFORMATION_SCHEMA.IGNITE_TRANSACTIONS t
> > > > JOIN INFORMATION_SCHEMA.IGNITE_TRANSACTION_ENTRIES te ON t.XID =
> > te.XID
> > > > JOIN INFORMATION_SCHEMA.IGNITE_CACHES c ON te.CACHE_NAME = c.NAME
> > > > JOIN INFORMATION_SCHEMA.IGNITE_CACHE_GROUPS cg ON c.GROUP_ID = cg.ID
> > > > WHERE t.START_TIME < TIMESTAMPADD('MINUTE', -5, NOW())
> > > > GROUP BY cg.CACHE_OR_GROUP_NAME, te.KEY_PARTITION
> > > >
> > > >
> > > >
> > > > Average CPU load on server nodes grouped by operating system:
> > > >
> > > > SELECT na.VALUE, COUNT(n.ID), AVG(nm.AVG_CPU_LOAD) AVG_CPU_LOAD
> > > > FROM INFORMATION_SCHEMA.IGNITE_NODES n
> > > > JOIN INFORMATION_SCHEMA.IGNITE_NODE_ATTRIBUTES na ON na.NODE_ID =
> n.ID
> > > AND
> > > > na.NAME = 'os.name'
> > > > JOIN INFORMATION_SCHEMA.IGNITE_NODE_METRICS nm ON nm.NODE_ID = n.ID
> > > > WHERE n.IS_CLIENT = false
> > > > GROUP BY na.VALUE
> > > >
> > > >
> > > >
> > > > Top 5 nodes by puts to cache ‘cache’:
> > > >
> > > > SELECT cm.NODE_ID, cm.CACHE_PUTS FROM
> > > > INFORMATION_SCHEMA.IGNITE_CACHE_NODE_METRICS cm
> > > > WHERE cm.CACHE_NAME = 'cache'
> > > > ORDER BY cm.CACHE_PUTS DESC
> > > > LIMIT 5
> > > >
> > > >
> > > >
> > > > Does this implementation interesting to someone else? Maybe any views
> > are
> > > > redundant? Which additional first-priority views must be implemented?
> > Any
> > > > other thoughts or proposal?
> > > >
> > > > [1] https://github.com/apache/ignite/pull/3413
> > > >
> > >
> >
>

Re: Ignite diagnostic (SQL system views)

Reply via email to