Re: Ignite diagnostic (SQL system views)

Anton Vinogradov Wed, 14 Feb 2018 01:50:01 -0800

Vova,

Could you confirm https://issues.apache.org/jira/browse/IGNITE-7527 ready
to be merged?


On Wed, Feb 14, 2018 at 12:01 PM, Vladimir Ozerov <voze...@gridgain.com>
wrote:

> I would start with NODES and NODE_ATTRIBUTES as the most simple thing.
>
> On Tue, Feb 13, 2018 at 4:10 AM, Denis Magda <dma...@apache.org> wrote:
>
> > Alex P, sounds like a good plan for me.
> >
> > Vladimir, do you have any suggestions or corrections?
> >
> > —
> > Denis
> >
> > > On Feb 12, 2018, at 4:57 AM, Alex Plehanov <plehanov.a...@gmail.com>
> > wrote:
> > >
> > > The views engine and the first view are almost ready to merge (review
> > > comments are resolved). Which views should we take next? My proposal -
> > > NODES, NODE_ATTRIBUTES, NODE_METRICS, NODE_HOSTS and NODE_ADDRESSES,
> > since
> > > these views are clear and all topology data available on each node.
> > > Any objections?
> > >
> > > 2018-01-25 16:27 GMT+03:00 Alex Plehanov <plehanov.a...@gmail.com>:
> > >
> > >> Anton, Vladimir, I've made some fixes. There is only one view left and
> > >> it's renamed to 'IGNITE.LOCAL_TRANSACTIONS'.
> > >>
> > >> High level design of solution:
> > >> When IgniteH2Indexing is starting, it create and start
> > >> new GridH2SysViewProcessor, which create and register in H2 (via its
> own
> > >> table engine) all implementations of system views. Each system view
> > >> implementation extends base abstract class GridH2SysView. View
> > >> implementation describes columns, their types and indexes in
> constructor
> > >> and must override method getRows for data retrieval (this method
> called
> > by
> > >> H2-compatible table and index implementations for ignite system
> views).
> > >> Almost no fixes to existing parsing engine was made, except some
> places,
> > >> where GridH2Table instance was expected, but for system views there is
> > >> another class.
> > >>
> > >> New PR: [1].  Please have a look.
> > >>
> > >> [1] https://github.com/apache/ignite/pull/3433
> > >>
> > >> 2018-01-24 19:12 GMT+03:00 Anton Vinogradov <avinogra...@gridgain.com
> >:
> > >>
> > >>> I've created IEP-13 [1] to cover all cases.
> > >>> Feel free to create issues.
> > >>>
> > >>> [1]
> > >>> https://cwiki.apache.org/confluence/pages/viewpage.
> > action?pageId=75962769
> > >>>
> > >>> On Wed, Jan 24, 2018 at 6:10 PM, Vladimir Ozerov <
> voze...@gridgain.com
> > >
> > >>> wrote:
> > >>>
> > >>>> Let's start with a single and the most simple view, e.g.
> > >>>> LOCAL_TRANSACTIONS. We will review and merge it along with necessary
> > >>>> infrastructure. Then will handle the rest view in separate tickets
> and
> > >>>> separate focused discussions.
> > >>>>
> > >>>> On Wed, Jan 24, 2018 at 5:29 PM, Alex Plehanov <
> > plehanov.a...@gmail.com
> > >>>>
> > >>>> wrote:
> > >>>>
> > >>>>> 1) It’s not a principal point, I can change schema. The
> > >>>> INFORMATION_SCHEMA
> > >>>>> was used because it’s already exists and usually used for metadata
> > >>> tables
> > >>>>> and views. Your proposal is to use schema “IGNITE”, am I understand
> > >>> you
> > >>>>> right? BTW, for now, we can’t query another (H2) meta tables from
> the
> > >>>>> INFORMATION_SCHEMA, so, “Ignite system views” is only available
> views
> > >>> to
> > >>>>> query from this schema.
> > >>>>> 2) Exactly for this reason the IGNITE_INSTANCE view is useful: to
> > >>>> determine
> > >>>>> which node we are connected to.
> > >>>>> 3) As the first phase, in my opinion, local views will be enough.
> > >>>>> Performance and caching of distributed views should be discussed at
> > >>> next
> > >>>>> phases, when distributed views implementation will be planned. In
> > >>> current
> > >>>>> implementation I tried to use indexing for local views wherever
> it’s
> > >>>>> possible.
> > >>>>> 4) I don’t think, that JVM info is more critical information than,
> > for
> > >>>>> example, caches or nodes information. When authorization
> capabilities
> > >>>>> planned to implement?
> > >>>>>
> > >>>>> About local data: yes, we can rename all currently implemented
> views
> > >>> for
> > >>>>> the local node data as LOCAL_..., and create (someday) new whole
> > >>> cluster
> > >>>>> views (which use distributed requests) without prefix or, for
> > example,
> > >>>> with
> > >>>>> CLUSTER_ prefix. But some views can show all cluster information
> > using
> > >>>> only
> > >>>>> local node data, without distributed requests (for example
> > >>>>> IGNITE_NODE_METRICS, IGNITE_PART_ASSIGNMENT,
> IGNITE_PART_ALLOCATION,
> > >>>>> IGNITE_NODES, etc). Are they local or cluster views in this
> concept?
> > >>>> Which
> > >>>>> prefix should be used? And what about caches? Are they local or
> > >>> cluster?
> > >>>> On
> > >>>>> local node we can see cluster wide caches (replicated and
> > distributed)
> > >>>> and
> > >>>>> caches for current node only. Local caches list may differ from
> node
> > >>> to
> > >>>>> node. Which prefix should be used for this view? And one more,
> there
> > >>> is
> > >>>> no
> > >>>>> sense for some views to make them cluster wide (for example
> > >>>>> INGNITE_INSTANCE). Should we name it LOCAL_INSTANCE without
> creating
> > >>>>> INSTANCE view?
> > >>>>>
> > >>>>> So, next steps: split PR, change schema name (IGNITE?), change view
> > >>> name
> > >>>>> for caches (CACHES, LOCAL_CACHES?)
> > >>>>>
> > >>>>>
> > >>>>> 2018-01-24 13:03 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>:
> > >>>>>
> > >>>>>> Hi Alex,
> > >>>>>>
> > >>>>>> System views could be extremely valuable addition for Ignite.
> > >>> Ideally,
> > >>>>> user
> > >>>>>> should be able to monitor and manage state of the whole cluster
> > >>> with a
> > >>>>>> single SQL command line. We have plans to implement it for a very
> > >>> long
> > >>>>>> time. However, this is very sensitive task which should take a lot
> > >>> of
> > >>>>>> moving pieces in count, such as usability, consistency,
> performance,
> > >>>>>> security, etc..
> > >>>>>>
> > >>>>>> Let me point several major concerns I see at the moment:
> > >>>>>>
> > >>>>>> 1) Usability: INFORMATION_SCHEMA
> > >>>>>> This schema is part of SQL ANSI standard. When creating system
> > >>> views,
> > >>>>> some
> > >>>>>> vendors prefer to store them in completely different predefined
> > >>> schema
> > >>>>>> (Oracle, MS SQL). Others prefer to keep them in INFORMATION_SCHEMA
> > >>>>>> directly. Both approaches could work. However, the latter breaks
> > >>>>> separation
> > >>>>>> of concerns - we store typical metadata near to possibly sensitive
> > >>>> system
> > >>>>>> data. Also it makes security management more complex - system data
> > >>> is
> > >>>>> very
> > >>>>>> sensitive, and now we cannot simply grant access
> > >>> INFORMATIONAL_SCHEMA
> > >>>> to
> > >>>>>> user. Instead, we have to grant that access on per-view basis. For
> > >>> this
> > >>>>>> reason my preference is to store system tables in separate schema,
> > >>> not
> > >>>> in
> > >>>>>> INFORMATION_SCHEMA
> > >>>>>>
> > >>>>>> 2) Consistency: local data
> > >>>>>> One of implemented view GridH2SysViewImplInstance. Normally SQL
> > >>> users
> > >>>>>> communicate with Ignite through JDBC/ODBC drivers. These drivers
> are
> > >>>>>> connected to a single node, typically client node. Moreover, we
> will
> > >>>>>> introduce high-availability feature when drivers were able to
> > >>> connect
> > >>>> to
> > >>>>>> any address from a predefined list. It renders this view useless,
> as
> > >>>> you
> > >>>>> do
> > >>>>>> not know which node you connected to. Also, local-only data cannot
> > >>> be
> > >>>>>> joined in general case - you will receive different results on
> > >>>> different
> > >>>>>> nodes. The same goes for transactions, JVM info, etc.
> > >>>>>>
> > >>>>>> 3) Performance
> > >>>>>> Suppose we fixed consistency of transactions and now this view
> shows
> > >>>>>> transactions in the whole cluster with possibility to filter them
> by
> > >>>>> nodes
> > >>>>>> - this is what user would expect out of the box. Another problem
> > >>>> appears
> > >>>>>> then - performance. How would we collect necessary data? How would
> > >>> we
> > >>>>>> handle joins, when particular view could be scanned multiple times
> > >>>> during
> > >>>>>> query execution? How we achieve sensible consistency? Most
> probably
> > >>> we
> > >>>>>> would collect remote data once when query is started, cache it
> > >>> somehow
> > >>>> on
> > >>>>>> query session level, and then re-use during joins. But again, this
> > >>>> should
> > >>>>>> be discussed separately.
> > >>>>>>
> > >>>>>> 4) Security: JVM info
> > >>>>>> We should define clear boundaries of what info is exposed. JVM
> data
> > >>>> along
> > >>>>>> with running threads is critically sensitive information. We
> should
> > >>> not
> > >>>>>> expose it until we have authorization capabilities.
> > >>>>>>
> > >>>>>> In order to start moving this code from prototype to production
> > >>> state
> > >>>> we
> > >>>>>> should start with the most simple and consistent views. E.g.
> > >>>>> IGNITE_CACHES.
> > >>>>>> Let's move it to a separate PR, review infrastructure code, review
> > >>> view
> > >>>>>> implementation, agree on proper naming and placement, and merge
> it.
> > >>>> Then
> > >>>>>> each and every view (or group of related views) should be
> discussed
> > >>> and
> > >>>>>> reviewed separately.
> > >>>>>>
> > >>>>>> As far as node-local stuff, may be we should move it to a separate
> > >>>>> schema,
> > >>>>>> or mark with special prefix. E.g. "IGNITE.TRANSACTIONS" - all
> > >>>>> transactions
> > >>>>>> in the cluster, "IGNITE.LOCAL_TRANSACTIONS" - transactions on the
> > >>> local
> > >>>>>> node. In this case we will be able to merge "local" stuff shortly,
> > >>> and
> > >>>>>> implement more complex but at the same time much more useful
> > >>>> distributed
> > >>>>>> stuff later on.
> > >>>>>>
> > >>>>>> Makes sense?
> > >>>>>>
> > >>>>>> Vladimir.
> > >>>>>>
> > >>>>>>
> > >>>>>> On Tue, Jan 23, 2018 at 8:30 PM, Alex Plehanov <
> > >>>> plehanov.a...@gmail.com>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> Hello, Igniters!
> > >>>>>>>
> > >>>>>>> For Ignite diagnostic usually it’s helpful to get some Ignite
> > >>>> internals
> > >>>>>>> information. But currently, in my opinion, there are no
> convenient
> > >>>>> tools
> > >>>>>>> for this purpose:
> > >>>>>>>
> > >>>>>>> ·        Some issues can be solved by analyzing log files. Log
> > >>> files
> > >>>>> are
> > >>>>>>> useful for dumps, but sometimes they are difficult to read. Also
> > >>>>>>> interesting metrics can’t be received runtime by request, we need
> > >>> to
> > >>>>> wait
> > >>>>>>> until Ignite will write these metrics by timeout or other events.
> > >>>>>>>
> > >>>>>>> ·        JMX is useful for scalar metrics. Complex and table data
> > >>> can
> > >>>>>> also
> > >>>>>>> be received, but it’s difficult to read, filter and sort them
> > >>> without
> > >>>>>>> processing by specialized external tools. For most frequently
> used
> > >>>>> cases
> > >>>>>>> almost duplicating metrics are created to show data in an
> > >>>> easy-to-read
> > >>>>>>> form.
> > >>>>>>>
> > >>>>>>> ·        Web-console is able to show table and complex data.
> > >>> Perhaps,
> > >>>>>>> someday  web-console will contain all necessary dashboards for
> > >>> most
> > >>>>>> problem
> > >>>>>>> investigation, but some non-trivial queries will not be covered
> > >>>> anyway.
> > >>>>>>> Also web-console needs additional infrastructure to work.
> > >>>>>>>
> > >>>>>>> ·        External “home-made” tools can be used for non-trivial
> > >>>> cases.
> > >>>>>> They
> > >>>>>>> cover highly specialized cases and usually can’t be used as
> > >>> general
> > >>>>>> purpose
> > >>>>>>> tools.
> > >>>>>>>
> > >>>>>>> Sometimes we are forced to use more than one tool and join data
> by
> > >>>>> hands
> > >>>>>>> (for example, current thread dump and data from logs).
> > >>>>>>>
> > >>>>>>> Often RDBMS for diagnostic purposes provides system views (for
> > >>>> example,
> > >>>>>>> DBA_% and V$% in Oracle), which can be queried by SQL. This
> > >>> solution
> > >>>>>> makes
> > >>>>>>> all internal diagnostic information available in a readable form
> > >>>> (with
> > >>>>>> all
> > >>>>>>> possible filters and projections) without using any other
> > >>> internal or
> > >>>>>>> external tools. My proposal is to create similar system views in
> > >>>>> Ignite.
> > >>>>>>>
> > >>>>>>> I implement working prototype (PR: [1]). It contains views:
> > >>>>>>>
> > >>>>>>> IGNITE_SYSTEM_VIEWS
> > >>>>>>>
> > >>>>>>> Registered system views
> > >>>>>>>
> > >>>>>>> IGNITE_INSTANCE
> > >>>>>>>
> > >>>>>>> Ignite instance
> > >>>>>>>
> > >>>>>>> IGNITE_JVM_THREADS
> > >>>>>>>
> > >>>>>>> JVM threads
> > >>>>>>>
> > >>>>>>> IGNITE_JVM_RUNTIME
> > >>>>>>>
> > >>>>>>> JVM runtime
> > >>>>>>>
> > >>>>>>> IGNITE_JVM_OS
> > >>>>>>>
> > >>>>>>> JVM operating system
> > >>>>>>>
> > >>>>>>> IGNITE_CACHES
> > >>>>>>>
> > >>>>>>> Ignite caches
> > >>>>>>>
> > >>>>>>> IGNITE_CACHE_CLUSTER_METRICS
> > >>>>>>>
> > >>>>>>> Ignite cache cluster metrics
> > >>>>>>>
> > >>>>>>> IGNITE_CACHE_NODE_METRICS
> > >>>>>>>
> > >>>>>>> Ignite cache node metrics
> > >>>>>>>
> > >>>>>>> IGNITE_CACHE_GROUPS
> > >>>>>>>
> > >>>>>>> Cache groups
> > >>>>>>>
> > >>>>>>> IGNITE_NODES
> > >>>>>>>
> > >>>>>>> Nodes in topology
> > >>>>>>>
> > >>>>>>> IGNITE_NODE_HOSTS
> > >>>>>>>
> > >>>>>>> Node hosts
> > >>>>>>>
> > >>>>>>> IGNITE_NODE_ADDRESSES
> > >>>>>>>
> > >>>>>>> Node addresses
> > >>>>>>>
> > >>>>>>> IGNITE_NODE_ATTRIBUTES
> > >>>>>>>
> > >>>>>>> Node attributes
> > >>>>>>>
> > >>>>>>> IGNITE_NODE_METRICS
> > >>>>>>>
> > >>>>>>> Node metrics
> > >>>>>>>
> > >>>>>>> IGNITE_TRANSACTIONS
> > >>>>>>>
> > >>>>>>> Active transactions
> > >>>>>>>
> > >>>>>>> IGNITE_TRANSACTION_ENTRIES
> > >>>>>>>
> > >>>>>>> Cache entries used by transaction
> > >>>>>>>
> > >>>>>>> IGNITE_TASKS
> > >>>>>>>
> > >>>>>>> Active tasks
> > >>>>>>>
> > >>>>>>> IGNITE_PART_ASSIGNMENT
> > >>>>>>>
> > >>>>>>> Partition assignment map
> > >>>>>>>
> > >>>>>>> IGNITE_PART_ALLOCATION
> > >>>>>>>
> > >>>>>>> Partition allocation map
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> There are much more useful views can be implemented (executors
> > >>>>>> diagnostic,
> > >>>>>>> SPIs diagnostic, etc).
> > >>>>>>>
> > >>>>>>> Some usage examples:
> > >>>>>>>
> > >>>>>>> Cache groups and their partitions, which used by transaction more
> > >>>> than
> > >>>>> 5
> > >>>>>>> minutes long:
> > >>>>>>>
> > >>>>>>> SELECT cg.CACHE_OR_GROUP_NAME, te.KEY_PARTITION, count(*) AS
> > >>>>> ENTITIES_CNT
> > >>>>>>> FROM INFORMATION_SCHEMA.IGNITE_TRANSACTIONS t
> > >>>>>>> JOIN INFORMATION_SCHEMA.IGNITE_TRANSACTION_ENTRIES te ON t.XID =
> > >>>>> te.XID
> > >>>>>>> JOIN INFORMATION_SCHEMA.IGNITE_CACHES c ON te.CACHE_NAME =
> c.NAME
> > >>>>>>> JOIN INFORMATION_SCHEMA.IGNITE_CACHE_GROUPS cg ON c.GROUP_ID =
> > >>> cg.ID
> > >>>>>>> WHERE t.START_TIME < TIMESTAMPADD('MINUTE', -5, NOW())
> > >>>>>>> GROUP BY cg.CACHE_OR_GROUP_NAME, te.KEY_PARTITION
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Average CPU load on server nodes grouped by operating system:
> > >>>>>>>
> > >>>>>>> SELECT na.VALUE, COUNT(n.ID), AVG(nm.AVG_CPU_LOAD) AVG_CPU_LOAD
> > >>>>>>> FROM INFORMATION_SCHEMA.IGNITE_NODES n
> > >>>>>>> JOIN INFORMATION_SCHEMA.IGNITE_NODE_ATTRIBUTES na ON na.NODE_ID
> =
> > >>>> n.ID
> > >>>>>> AND
> > >>>>>>> na.NAME = 'os.name'
> > >>>>>>> JOIN INFORMATION_SCHEMA.IGNITE_NODE_METRICS nm ON nm.NODE_ID =
> > >>> n.ID
> > >>>>>>> WHERE n.IS_CLIENT = false
> > >>>>>>> GROUP BY na.VALUE
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Top 5 nodes by puts to cache ‘cache’:
> > >>>>>>>
> > >>>>>>> SELECT cm.NODE_ID, cm.CACHE_PUTS FROM
> > >>>>>>> INFORMATION_SCHEMA.IGNITE_CACHE_NODE_METRICS cm
> > >>>>>>> WHERE cm.CACHE_NAME = 'cache'
> > >>>>>>> ORDER BY cm.CACHE_PUTS DESC
> > >>>>>>> LIMIT 5
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Does this implementation interesting to someone else? Maybe any
> > >>> views
> > >>>>> are
> > >>>>>>> redundant? Which additional first-priority views must be
> > >>> implemented?
> > >>>>> Any
> > >>>>>>> other thoughts or proposal?
> > >>>>>>>
> > >>>>>>> [1] https://github.com/apache/ignite/pull/3413
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> > >>
> >
> >
>

Re: Ignite diagnostic (SQL system views)

Reply via email to