Igor, I think that built-in monitoring facility will add great value to the product. We have to deal with user performance issues pretty often, and it is always a kind of pain to get to the bottom of the problem. We have to ask users for configuration, logs, system config, etc, etc.. Instead, it would be great if we had a single big "switch". If user has performance issue, he turns it on, then perform problematic operations, and then dumps all collected data. We can collect dozens of things: 1) OS/JVM information 2) Ignite configs, logs, etc.. 3) Performance data (CPU, RAM, IO) 4) Metrics 5) JMX data (both Ignite and JVM) 6) Some internal tracing (SQL query plans, how long it takes messages to pass between nodes, etc.)
I think the most important part here is good infrastructure (interfaces) and API. So that we can start with something very simple, like collecting configs from all nodes, or starting/stopping shell commands, and then gradually add more and more tracing facilities. Thoughts? Vladimir. On Thu, Jul 14, 2016 at 11:36 PM, Igor Rudyak <irud...@gmail.com> wrote: > Yakov, as for now I just have well structured scripts to setup Ganglia > agent on Ignite hosts to monitor system metrics like CPU, RAM, IO and etc > (this scripts already included in Ignite 1.6). > > Also experimented with displaying JVM metrics by providing java agent and > specifying MBeans to collect metrics from. But it's rather draft version. > The second problem is, there are plenty of MBeans in Ignite - I just don't > know which to select from. > > Anyway, the original idea was to check with the community if it makes sense > to have such monitoring functionality out of the box. > > Igor Rudyak > > > > On Thu, Jul 14, 2016 at 1:05 AM, Yakov Zhdanov <yzhda...@apache.org> > wrote: > > > Igor, can you please share the changes to scripts you did to support > > monitoring? Can it be done by defining and exporting JAVA_OPTS env > variable > > and then launching ignite.sh? > > > > Thanks! > > > > --Yakov > > > > 2016-07-13 22:45 GMT+03:00 Igor Rudyak <irud...@gmail.com>: > > > > > Hi guys, > > > > > > While experimenting with large Ignite clusters I found that lack of > > > monitoring is rather critical problem. I know that Ignite provides > number > > > of JMX MBeans to monitor custom metrics in addition to host system > > metrics > > > (CPU, IO, RAM, ....). The problem is, there are no out of the box > > solution > > > to monitor all this. > > > > > > Thus you have to manually setup some kind of monitoring tool like > > Graphite, > > > Grafana, Ganglia and etc. Which involves setting up monitoring agents > on > > > all the nodes, uploading JMX agent on all the nodes, selecting > > appropriate > > > metrics from the plenty of JMX MBeans and preparing config files, > tuning > > > Ignite shell scripts to include "java agent" in java launch command. > Lots > > > of work and pain, each time you want to create new Ignite cluster. > > > > > > Probably it makes sense to have all these out of the box, by slightly > > > modifying existing and providing additional shell scripts, to bootstrap > > all > > > monitoring infrastructure? > > > > > > Igor Rudyak > > > > > >