Hello, Alexey. Thanks for the review.
My understanding if the following: We will have 3 in-depth tool to find issues in cluster: 1. Metrics + System views - data that describe Ignite entities very high-level. 2. Profiling - tool to know what specific query of transactions are slow. In many cases, this knowledge is enough to fix the issue(rewrite query, redesign transactions flow, etc) 3. Tracing - tool to know why one of 1000 of the same queries was slow. The most detailed view of the Ignite internal processes. > For example, a user would not be able to match a long task with a long job in > that task. This is not true. Profiling report will aggregate data from all nodes. So there will be both * summary time of the task * time of the each job in the task. > 8 июня 2020 г., в 12:52, Alexey Goncharuk <alexey.goncha...@gmail.com> > написал(а): > > Nikita, Igniters, > > I left a few comments on the tool itself in the PR. > > However, I would like to reiterate and discuss why a user would prefer to > use the profiling tool over tracing? Profiling tool only captures very > high-level details of the operations (a single cache operation, for > example), and does not interconnect operations happened on different nodes. > For example, a user would not be able to match a long task with a long job > in that task. In other words, profiling data is always a subset of data > collected by tracing. > > Maybe it makes sense to adopt local log file approach to write spans so we > can process those spans later to build a report? > > чт, 4 июн. 2020 г. в 11:16, Nikita Amelchev <nsamelc...@gmail.com>: > >> Hi, Igniters. >> >> I have implemented cluster profiling and tool to build the performance >> report. It's ready to be reviewed. [1, 2] >> >> Profiling can be managed by JMX bean. I have plans to implement it to >> control.sh also. >> >> Nodes write statistics to the temporary off heap buffer and then one >> thread flushes to the profiling files. The write mechanics and format >> is like WAL logging. >> The report contains the following statistics: >> - nodes and caches info >> - cache operations and transaction statistics >> - SQL and scan queries statistics (include logical and physical reads per >> query) >> - tasks and jobs statistics. >> >> More details in the IEP [3]. >> >> [1] https://github.com/apache/ignite/pull/7693 >> [2] https://issues.apache.org/jira/browse/IGNITE-12666 >> [3] >> https://cwiki.apache.org/confluence/display/IGNITE/Cluster+performance+profiling+tool >> >> вс, 26 апр. 2020 г. в 17:29, Вячеслав Коптилин <slava.kopti...@gmail.com>: >>> >>> Hello Nikolay, >>> >>>> Who deprecated visor and when? Maybe I miss something? >>> On the one hand, there was technically no community consensus that this >>> tool should be obsolete. >>> On the other hand, my opinion based on the following topic: >>> >> http://apache-ignite-developers.2346864.n4.nabble.com/Re-Visor-plugin-tp44879p44939.html >>> Moreover, it seems to me, currently, the control utility is widely used >> and >>> actively developed, instead of the visor. >>> >>>> It's true that, for now, Ignite doesn't have "tool strategy" I think >> it's >>> a big issue from the user's point of view. >>> I absolutely agree with that. >>> >>>> We should solve it in the nearest time. Feel free to start this >> activity >>> I have no plan at the moment. However, at the first stage, we could >>> understand the difference between visor and control utility. >>> All useful features from visor should be moved/implemented in control >>> utility and after that visor tool and should be marked as >>> deprecated/obsoleted. >>> >>>> You need to throw in control.sh also, which does some kind of >> statistics >>> too, such as idle_verify. >>>> Please, clarify your idea: >>>> We should use some info from control.sh to the report? >>>> The report should be generated by some control.sh subcommand? >>> If I am not mistaken, the oracle database has AWR tool (mentioned on the >>> IEP page) which is a command-line utility that generates HTML reports. >>> I like this idea and I think this is a good approach that can be realized >>> in the control utility. >>> If we have a case that cannot be implemented in this way, we have to >>> clearly states the difference between these tools so as not to confuse >> our >>> users. >>> What do you think? >>> >>> Thanks, >>> Slava. >>> >>> >>> сб, 25 апр. 2020 г. в 12:00, Nikolay Izhikov <nizhi...@apache.org>: >>> >>>> Hello, Slava, Ilya, Denis. >>>> >>>> Thanks for joining this discussion! >>>> >>>>> - visor (which is deprecated) >>>> >>>> Who deprecated visor and when? >>>> Maybe I miss something? >>>> >>>>> - web-console (to be honest, I don't quite understand the status of >> this >>>> tool) >>>> >>>> +1. >>>> >>>>> I am not against the new tool, I just want to understand the >> motivation >>>> to not improve the existing sub-projects. >>>> >>>> It's true that, for now, Ignite doesn't have "tool strategy" >>>> I think it's a big issue from the user's point of view. >>>> We should solve it in the nearest time. >>>> Feel free to start this activity. >>>> >>>>> - new ignite-profiling (which is a monitoring tool as well, judging >> by >>>> the provided link [1] ) >>>> >>>> The general idea is the following: >>>> >>>> 1. We should have some profiling mechanism that will generate a >> node-local >>>> event log >>>> 2. We should have a tool that can export events to some third-party >>>> system. This system can be an Elastic Search(Kibana) or Ignite >> performance >>>> report or Kafka log, whatever. >>>> 3. Ignite performance report, in the first release, should be a >> "static" >>>> tool. >>>> This means we take static logs(that is not rewritten in the >> analysis >>>> time) and feed them in the script(or tool or control.sh, whatever) >>>> The script produces static report that can be used for overall >>>> performance analysis. >>>> >>>> The primary users of this report is a developer of Ignite based >>>> applications and performance engineers. >>>> >>>> Ilya, >>>> >>>>> You need to throw in control.sh also, which does some kind of >> statistics >>>> too, such as idle_verify. >>>> >>>> Please, clarify your idea: >>>> We should use some info from control.sh to the report? >>>> The report should be generated by some control.sh subcommand? >>>> >>>> >>>> Denis, >>>> >>>>> Speaking of the probes/statistics collection approach, is it >> supposed to >>>> reuse tracing capabilities that are to be added as part of IEP-35? >>>> >>>> For now, we don't have any results of tracing development available in >>>> Apache Ignite. >>>> Hopefully, we got some in a couple of weeks. >>>> After it, we can start a discussion of how to merge two improvements. >>>> >>>> >>>> >>>>> 24 апр. 2020 г., в 20:32, Denis Magda <dma...@apache.org> >> написал(а): >>>>> >>>>>> >>>>>> Tracing is more deeply takes statistics. If it will be possible, >> I'm for >>>>>> reuse. >>>>> >>>>> >>>>> Looks like we need to sync up on these activities/initiatives to >> ensure >>>> we >>>>> don't do a duplicate job. If you think a separate discussion is >> necessary >>>>> let's kick it off. >>>>> >>>>> - >>>>> Denis >>>>> >>>>> >>>>> On Fri, Apr 24, 2020 at 9:18 AM Nikita Amelchev < >> nsamelc...@gmail.com> >>>>> wrote: >>>>> >>>>>> Denis, Ilya, >>>>>> >>>>>> I will try to integrate profiling functionality into control.sh >> utility. >>>>>> >>>>>>> Speaking of the probes/statistics collection approach, is it >> supposed >>>> to >>>>>>> reuse tracing capabilities that are to be added as part of IEP-35? >>>>>> Tracing is more deeply takes statistics. If it will be possible, >> I'm for >>>>>> reuse. >>>>>> >>>>>> пт, 24 апр. 2020 г. в 18:59, Ilya Kasnacheev < >> ilya.kasnach...@gmail.com >>>>> : >>>>>>> >>>>>>> Hello! >>>>>>> >>>>>>> I suggest that it's one of the places where it could be put >> instead of >>>>>>> adding a new tool. >>>>>>> >>>>>>> Regards, >>>>>>> -- >>>>>>> Ilya Kasnacheev >>>>>>> >>>>>>> >>>>>>> пт, 24 апр. 2020 г. в 18:56, Nikita Amelchev <nsamelc...@gmail.com >>> : >>>>>>> >>>>>>>> Ilya, >>>>>>>> >>>>>>>> You suggest using control.sh to build the report? >>>>>>>> >>>>>>>> пт, 24 апр. 2020 г. в 18:20, Ilya Kasnacheev < >>>>>> ilya.kasnach...@gmail.com>: >>>>>>>>> >>>>>>>>> Hello! >>>>>>>>> >>>>>>>>> You need to throw in control.sh also, which does some kind of >>>>>> statistics >>>>>>>>> too, such as idle_verify. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> -- >>>>>>>>> Ilya Kasnacheev >>>>>>>>> >>>>>>>>> >>>>>>>>> пт, 24 апр. 2020 г. в 18:06, Вячеслав Коптилин < >>>>>> slava.kopti...@gmail.com >>>>>>>>> : >>>>>>>>> >>>>>>>>>> Hello Nikita, >>>>>>>>>> >>>>>>>>>> Perhaps, I am missing something... >>>>>>>>>> Apache Ignite already has a web-console tool. Do we want to >>>>>> improve the >>>>>>>>>> existing tool instead of creating a new one? >>>>>>>>>> It seems to me, this can be confusing for users. >>>>>>>>>> - visor (which is deprecated) >>>>>>>>>> - web-console (to be honest, I don't quite understand the status >>>>>> of >>>>>>>> this >>>>>>>>>> tool) >>>>>>>>>> - new ignite-profiling (which is a monitoring tool as well, >>>>>> judging >>>>>>>> by the >>>>>>>>>> provided link [1] ) >>>>>>>>>> >>>>>>>>>> I am not against the new tool, I just want to understand the >>>>>>>> motivation to >>>>>>>>>> not improve the existing sub-projects. >>>>>>>>>> >>>>>>>>>> [1] >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >>>> >> https://cwiki.apache.org/confluence/display/IGNITE/Cluster+performance+profiling+tool >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> S. >>>>>>>>>> >>>>>>>>>> пт, 24 апр. 2020 г. в 14:58, Nikita Amelchev < >> nsamelc...@gmail.com >>>>>>> : >>>>>>>>>> >>>>>>>>>>> Hi, Igniters. >>>>>>>>>>> >>>>>>>>>>> I'm working on cluster profiling and the tool for creating a >>>>>>>>>>> performance report. [1] I have prepared PoC based on >> performance >>>>>>>>>>> logging to a separate category of Ignite log. The report >>>>>> contains: >>>>>>>>>>> >>>>>>>>>>> - Cache operations and its distribution by types [2] >>>>>>>>>>> - Transactions and histogram of durations [3] >>>>>>>>>>> - SQL and Scan query statistics, top of slowest queries, >> logical >>>>>> and >>>>>>>>>>> physical reads by query [4] >>>>>>>>>>> - Compute statistics, top of slowest tasks and their jobs [5] >>>>>>>>>>> Soon I will add: >>>>>>>>>>> - Topology and Ignite versions info >>>>>>>>>>> - Client ID in case of operations from clients >>>>>>>>>>> >>>>>>>>>>> For now, I'm developing a binary logging format to reduce the >>>>>> effect >>>>>>>>>>> on performance. I'll try to reuse Ignite mechanisms. >>>>>>>>>>> >>>>>>>>>>> I would like to hear suggestions for the profiling and the >>>>>>>> performance >>>>>>>>>>> report. >>>>>>>>>>> >>>>>>>>>>> [1] >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >>>> >> https://cwiki.apache.org/confluence/display/IGNITE/Cluster+performance+profiling+tool >>>>>>>>>>> [2] >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >>>> >> https://cwiki.apache.org/confluence/display/IGNITE/Cluster+performance+profiling+tool?preview=/145723859/148647581/p1.png >>>>>>>>>>> [3] >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >>>> >> https://cwiki.apache.org/confluence/display/IGNITE/Cluster+performance+profiling+tool?preview=/145723859/148647582/p2.png >>>>>>>>>>> [4] >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >>>> >> https://cwiki.apache.org/confluence/display/IGNITE/Cluster+performance+profiling+tool?preview=/145723859/148647583/p3.png >>>>>>>>>>> [5] >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >>>> >> https://cwiki.apache.org/confluence/display/IGNITE/Cluster+performance+profiling+tool?preview=/145723859/152112279/p5.png >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Best wishes, >>>>>>>>>>> Amelchev Nikita >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Best wishes, >>>>>>>> Amelchev Nikita >>>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best wishes, >>>>>> Amelchev Nikita >>>>>> >>>> >>>> >> >> >> >> -- >> Best wishes, >> Amelchev Nikita >>