Re: Ignite diagnostic (SQL system views)

Vladimir Ozerov Wed, 24 Jan 2018 07:10:27 -0800

Let's start with a single and the most simple view, e.g.
LOCAL_TRANSACTIONS. We will review and merge it along with necessary
infrastructure. Then will handle the rest view in separate tickets and
separate focused discussions.


On Wed, Jan 24, 2018 at 5:29 PM, Alex Plehanov <plehanov.a...@gmail.com>
wrote:

> 1) It’s not a principal point, I can change schema. The INFORMATION_SCHEMA
> was used because it’s already exists and usually used for metadata tables
> and views. Your proposal is to use schema “IGNITE”, am I understand you
> right? BTW, for now, we can’t query another (H2) meta tables from the
> INFORMATION_SCHEMA, so, “Ignite system views” is only available views to
> query from this schema.
> 2) Exactly for this reason the IGNITE_INSTANCE view is useful: to determine
> which node we are connected to.
> 3) As the first phase, in my opinion, local views will be enough.
> Performance and caching of distributed views should be discussed at next
> phases, when distributed views implementation will be planned. In current
> implementation I tried to use indexing for local views wherever it’s
> possible.
> 4) I don’t think, that JVM info is more critical information than, for
> example, caches or nodes information. When authorization capabilities
> planned to implement?
>
> About local data: yes, we can rename all currently implemented views for
> the local node data as LOCAL_..., and create (someday) new whole cluster
> views (which use distributed requests) without prefix or, for example, with
> CLUSTER_ prefix. But some views can show all cluster information using only
> local node data, without distributed requests (for example
> IGNITE_NODE_METRICS, IGNITE_PART_ASSIGNMENT, IGNITE_PART_ALLOCATION,
> IGNITE_NODES, etc). Are they local or cluster views in this concept? Which
> prefix should be used? And what about caches? Are they local or cluster? On
> local node we can see cluster wide caches (replicated and distributed) and
> caches for current node only. Local caches list may differ from node to
> node. Which prefix should be used for this view? And one more, there is no
> sense for some views to make them cluster wide (for example
> INGNITE_INSTANCE). Should we name it LOCAL_INSTANCE without creating
> INSTANCE view?
>
> So, next steps: split PR, change schema name (IGNITE?), change view name
> for caches (CACHES, LOCAL_CACHES?)
>
>
> 2018-01-24 13:03 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>:
>
> > Hi Alex,
> >
> > System views could be extremely valuable addition for Ignite. Ideally,
> user
> > should be able to monitor and manage state of the whole cluster with a
> > single SQL command line. We have plans to implement it for a very long
> > time. However, this is very sensitive task which should take a lot of
> > moving pieces in count, such as usability, consistency, performance,
> > security, etc..
> >
> > Let me point several major concerns I see at the moment:
> >
> > 1) Usability: INFORMATION_SCHEMA
> > This schema is part of SQL ANSI standard. When creating system views,
> some
> > vendors prefer to store them in completely different predefined schema
> > (Oracle, MS SQL). Others prefer to keep them in INFORMATION_SCHEMA
> > directly. Both approaches could work. However, the latter breaks
> separation
> > of concerns - we store typical metadata near to possibly sensitive system
> > data. Also it makes security management more complex - system data is
> very
> > sensitive, and now we cannot simply grant access INFORMATIONAL_SCHEMA to
> > user. Instead, we have to grant that access on per-view basis. For this
> > reason my preference is to store system tables in separate schema, not in
> > INFORMATION_SCHEMA
> >
> > 2) Consistency: local data
> > One of implemented view GridH2SysViewImplInstance. Normally SQL users
> > communicate with Ignite through JDBC/ODBC drivers. These drivers are
> > connected to a single node, typically client node. Moreover, we will
> > introduce high-availability feature when drivers were able to connect to
> > any address from a predefined list. It renders this view useless, as you
> do
> > not know which node you connected to. Also, local-only data cannot be
> > joined in general case - you will receive different results on different
> > nodes. The same goes for transactions, JVM info, etc.
> >
> > 3) Performance
> > Suppose we fixed consistency of transactions and now this view shows
> > transactions in the whole cluster with possibility to filter them by
> nodes
> > - this is what user would expect out of the box. Another problem appears
> > then - performance. How would we collect necessary data? How would we
> > handle joins, when particular view could be scanned multiple times during
> > query execution? How we achieve sensible consistency? Most probably we
> > would collect remote data once when query is started, cache it somehow on
> > query session level, and then re-use during joins. But again, this should
> > be discussed separately.
> >
> > 4) Security: JVM info
> > We should define clear boundaries of what info is exposed. JVM data along
> > with running threads is critically sensitive information. We should not
> > expose it until we have authorization capabilities.
> >
> > In order to start moving this code from prototype to production state we
> > should start with the most simple and consistent views. E.g.
> IGNITE_CACHES.
> > Let's move it to a separate PR, review infrastructure code, review view
> > implementation, agree on proper naming and placement, and merge it. Then
> > each and every view (or group of related views) should be discussed and
> > reviewed separately.
> >
> > As far as node-local stuff, may be we should move it to a separate
> schema,
> > or mark with special prefix. E.g. "IGNITE.TRANSACTIONS" - all
> transactions
> > in the cluster, "IGNITE.LOCAL_TRANSACTIONS" - transactions on the local
> > node. In this case we will be able to merge "local" stuff shortly, and
> > implement more complex but at the same time much more useful distributed
> > stuff later on.
> >
> > Makes sense?
> >
> > Vladimir.
> >
> >
> > On Tue, Jan 23, 2018 at 8:30 PM, Alex Plehanov <plehanov.a...@gmail.com>
> > wrote:
> >
> > > Hello, Igniters!
> > >
> > > For Ignite diagnostic usually it’s helpful to get some Ignite internals
> > > information. But currently, in my opinion, there are no convenient
> tools
> > > for this purpose:
> > >
> > > ·        Some issues can be solved by analyzing log files. Log files
> are
> > > useful for dumps, but sometimes they are difficult to read. Also
> > > interesting metrics can’t be received runtime by request, we need to
> wait
> > > until Ignite will write these metrics by timeout or other events.
> > >
> > > ·        JMX is useful for scalar metrics. Complex and table data can
> > also
> > > be received, but it’s difficult to read, filter and sort them without
> > > processing by specialized external tools. For most frequently used
> cases
> > > almost duplicating metrics are created to show data in an easy-to-read
> > > form.
> > >
> > > ·        Web-console is able to show table and complex data. Perhaps,
> > > someday  web-console will contain all necessary dashboards for most
> > problem
> > > investigation, but some non-trivial queries will not be covered anyway.
> > > Also web-console needs additional infrastructure to work.
> > >
> > > ·        External “home-made” tools can be used for non-trivial cases.
> > They
> > > cover highly specialized cases and usually can’t be used as general
> > purpose
> > > tools.
> > >
> > > Sometimes we are forced to use more than one tool and join data by
> hands
> > > (for example, current thread dump and data from logs).
> > >
> > > Often RDBMS for diagnostic purposes provides system views (for example,
> > > DBA_% and V$% in Oracle), which can be queried by SQL. This solution
> > makes
> > > all internal diagnostic information available in a readable form (with
> > all
> > > possible filters and projections) without using any other internal or
> > > external tools. My proposal is to create similar system views in
> Ignite.
> > >
> > > I implement working prototype (PR: [1]). It contains views:
> > >
> > > IGNITE_SYSTEM_VIEWS
> > >
> > > Registered system views
> > >
> > > IGNITE_INSTANCE
> > >
> > > Ignite instance
> > >
> > > IGNITE_JVM_THREADS
> > >
> > > JVM threads
> > >
> > > IGNITE_JVM_RUNTIME
> > >
> > > JVM runtime
> > >
> > > IGNITE_JVM_OS
> > >
> > > JVM operating system
> > >
> > > IGNITE_CACHES
> > >
> > > Ignite caches
> > >
> > > IGNITE_CACHE_CLUSTER_METRICS
> > >
> > > Ignite cache cluster metrics
> > >
> > > IGNITE_CACHE_NODE_METRICS
> > >
> > > Ignite cache node metrics
> > >
> > > IGNITE_CACHE_GROUPS
> > >
> > > Cache groups
> > >
> > > IGNITE_NODES
> > >
> > > Nodes in topology
> > >
> > > IGNITE_NODE_HOSTS
> > >
> > > Node hosts
> > >
> > > IGNITE_NODE_ADDRESSES
> > >
> > > Node addresses
> > >
> > > IGNITE_NODE_ATTRIBUTES
> > >
> > > Node attributes
> > >
> > > IGNITE_NODE_METRICS
> > >
> > > Node metrics
> > >
> > > IGNITE_TRANSACTIONS
> > >
> > > Active transactions
> > >
> > > IGNITE_TRANSACTION_ENTRIES
> > >
> > > Cache entries used by transaction
> > >
> > > IGNITE_TASKS
> > >
> > > Active tasks
> > >
> > > IGNITE_PART_ASSIGNMENT
> > >
> > > Partition assignment map
> > >
> > > IGNITE_PART_ALLOCATION
> > >
> > > Partition allocation map
> > >
> > >
> > >
> > > There are much more useful views can be implemented (executors
> > diagnostic,
> > > SPIs diagnostic, etc).
> > >
> > > Some usage examples:
> > >
> > > Cache groups and their partitions, which used by transaction more than
> 5
> > > minutes long:
> > >
> > > SELECT cg.CACHE_OR_GROUP_NAME, te.KEY_PARTITION, count(*) AS
> ENTITIES_CNT
> > > FROM INFORMATION_SCHEMA.IGNITE_TRANSACTIONS t
> > > JOIN INFORMATION_SCHEMA.IGNITE_TRANSACTION_ENTRIES te ON t.XID =
> te.XID
> > > JOIN INFORMATION_SCHEMA.IGNITE_CACHES c ON te.CACHE_NAME = c.NAME
> > > JOIN INFORMATION_SCHEMA.IGNITE_CACHE_GROUPS cg ON c.GROUP_ID = cg.ID
> > > WHERE t.START_TIME < TIMESTAMPADD('MINUTE', -5, NOW())
> > > GROUP BY cg.CACHE_OR_GROUP_NAME, te.KEY_PARTITION
> > >
> > >
> > >
> > > Average CPU load on server nodes grouped by operating system:
> > >
> > > SELECT na.VALUE, COUNT(n.ID), AVG(nm.AVG_CPU_LOAD) AVG_CPU_LOAD
> > > FROM INFORMATION_SCHEMA.IGNITE_NODES n
> > > JOIN INFORMATION_SCHEMA.IGNITE_NODE_ATTRIBUTES na ON na.NODE_ID = n.ID
> > AND
> > > na.NAME = 'os.name'
> > > JOIN INFORMATION_SCHEMA.IGNITE_NODE_METRICS nm ON nm.NODE_ID = n.ID
> > > WHERE n.IS_CLIENT = false
> > > GROUP BY na.VALUE
> > >
> > >
> > >
> > > Top 5 nodes by puts to cache ‘cache’:
> > >
> > > SELECT cm.NODE_ID, cm.CACHE_PUTS FROM
> > > INFORMATION_SCHEMA.IGNITE_CACHE_NODE_METRICS cm
> > > WHERE cm.CACHE_NAME = 'cache'
> > > ORDER BY cm.CACHE_PUTS DESC
> > > LIMIT 5
> > >
> > >
> > >
> > > Does this implementation interesting to someone else? Maybe any views
> are
> > > redundant? Which additional first-priority views must be implemented?
> Any
> > > other thoughts or proposal?
> > >
> > > [1] https://github.com/apache/ignite/pull/3413
> > >
> >
>

Re: Ignite diagnostic (SQL system views)

Reply via email to