Igniters,

As a preface, Alexey Kukushkin laid out an insightful and profound explanation 
on what’s wrong with Ignite logs from a DevOps perspective, how the community 
can easily tackle the gaps and how our efforts will be payed off if we take his 
advice in consideration: 
http://apache-ignite-developers.2346864.n4.nabble.com/Ignite-not-friendly-for-Monitoring-td20802.html

In short, Ignite log events (errors, warnings and non-severe messages) are not 
assigned unique identifiers. 
Why a mature project like Ignite needs it?

First, to have a human-friendly glossary of error messages or warnings (see 
MySQL [1] and MongoDB [2] examples) that simplify troubleshooting and debugging 
on the dev side. Actually we planned to do it back in 2016! [3]

Second, turns out to be that popular DevOps monitoring tools such as DynaTrace 
[4] and Nagios [5] can easily analyze IDs of log events and help automate their 
processing or trigger notifications. For instance, if “node left” log message 
was labeled with an ID then DynaTrace could detect that event and by looking at 
overall memory usage (JMX) decide what to do next - just send an email to an 
admin or add a new node to the cluster.

My proposal is to start putting the glossary together making Ignite ready for 
enterprise grade monitoring systems and DevOps! 

As a first step, let’s define subsystems of Ignite spreading out IDs ranges 
among them:
- networking (discovery, communication) - 1000 - 3000
- memory and persistence - 4000 - 6000
- key-value, caching - 7000 - 9000
- SQL - 10000 - 11000
- etc.

Is everyone with this format and overall endeavor? 

[1] https://dev.mysql.com/doc/refman/5.5/en/error-messages-server.html
[2] https://github.com/mongodb/mongo/blob/master/src/mongo/base/error_codes.err
[3] https://issues.apache.org/jira/browse/IGNITE-3690
[4] https://www.dynatrace.com/capabilities/log-analytics/
[5] https://www.nagios.com/solutions/log-monitoring/

Reply via email to