Hello, Alexey.

Thanks for the review.

My understanding if the following:

We will have 3 in-depth tool to find issues in cluster:

1. Metrics + System views - data that describe Ignite entities very high-level.

2. Profiling - tool to know what specific query of transactions are slow. 
In many cases, this knowledge is enough to fix the issue(rewrite query, 
redesign transactions flow, etc)

3. Tracing - tool to know why one of 1000 of the same queries was slow.
The most detailed view of the Ignite internal processes.

> For example, a user would not be able to match a long task with a long job in 
> that task.

This is not true.
Profiling report will aggregate data from all nodes.
So there will be both

 * summary time of the task
 * time of the each job in the task.


> 8 июня 2020 г., в 12:52, Alexey Goncharuk <alexey.goncha...@gmail.com> 
> написал(а):
> 
> Nikita, Igniters,
> 
> I left a few comments on the tool itself in the PR.
> 
> However, I would like to reiterate and discuss why a user would prefer to
> use the profiling tool over tracing? Profiling tool only captures very
> high-level details of the operations (a single cache operation, for
> example), and does not interconnect operations happened on different nodes.
> For example, a user would not be able to match a long task with a long job
> in that task. In other words, profiling data is always a subset of data
> collected by tracing.
> 
> Maybe it makes sense to adopt local log file approach to write spans so we
> can process those spans later to build a report?
> 
> чт, 4 июн. 2020 г. в 11:16, Nikita Amelchev <nsamelc...@gmail.com>:
> 
>> Hi, Igniters.
>> 
>> I have implemented cluster profiling and tool to build the performance
>> report. It's ready to be reviewed. [1, 2]
>> 
>> Profiling can be managed by JMX bean. I have plans to implement it to
>> control.sh also.
>> 
>> Nodes write statistics to the temporary off heap buffer and then one
>> thread flushes to the profiling files. The write mechanics and format
>> is like WAL logging.
>> The report contains the following statistics:
>> - nodes and caches info
>> - cache operations and transaction statistics
>> - SQL and scan queries statistics (include logical and physical reads per
>> query)
>> - tasks and jobs statistics.
>> 
>> More details in the IEP [3].
>> 
>> [1] https://github.com/apache/ignite/pull/7693
>> [2] https://issues.apache.org/jira/browse/IGNITE-12666
>> [3]
>> https://cwiki.apache.org/confluence/display/IGNITE/Cluster+performance+profiling+tool
>> 
>> вс, 26 апр. 2020 г. в 17:29, Вячеслав Коптилин <slava.kopti...@gmail.com>:
>>> 
>>> Hello Nikolay,
>>> 
>>>> Who deprecated visor and when? Maybe I miss something?
>>> On the one hand, there was technically no community consensus that this
>>> tool should be obsolete.
>>> On the other hand, my opinion based on the following topic:
>>> 
>> http://apache-ignite-developers.2346864.n4.nabble.com/Re-Visor-plugin-tp44879p44939.html
>>> Moreover, it seems to me, currently, the control utility is widely used
>> and
>>> actively developed, instead of the visor.
>>> 
>>>> It's true that, for now, Ignite doesn't have "tool strategy" I think
>> it's
>>> a big issue from the user's point of view.
>>> I absolutely agree with that.
>>> 
>>>> We should solve it in the nearest time. Feel free to start this
>> activity
>>> I have no plan at the moment. However, at the first stage, we could
>>> understand the difference between visor and control utility.
>>> All useful features from visor should be moved/implemented in control
>>> utility and after that visor tool and should be marked as
>>> deprecated/obsoleted.
>>> 
>>>> You need to throw in control.sh also, which does some kind of
>> statistics
>>> too, such as idle_verify.
>>>> Please, clarify your idea:
>>>>   We should use some info from control.sh to the report?
>>>>   The report should be generated by some control.sh subcommand?
>>> If I am not mistaken, the oracle database has AWR tool (mentioned on the
>>> IEP page) which is a command-line utility that generates HTML reports.
>>> I like this idea and I think this is a good approach that can be realized
>>> in the control utility.
>>> If we have a case that cannot be implemented in this way, we have to
>>> clearly states the difference between these tools so as not to confuse
>> our
>>> users.
>>> What do you think?
>>> 
>>> Thanks,
>>> Slava.
>>> 
>>> 
>>> сб, 25 апр. 2020 г. в 12:00, Nikolay Izhikov <nizhi...@apache.org>:
>>> 
>>>> Hello, Slava, Ilya, Denis.
>>>> 
>>>> Thanks for joining this discussion!
>>>> 
>>>>> - visor (which is deprecated)
>>>> 
>>>> Who deprecated visor and when?
>>>> Maybe I miss something?
>>>> 
>>>>> - web-console (to be honest, I don't quite understand the status of
>> this
>>>> tool)
>>>> 
>>>> +1.
>>>> 
>>>>> I am not against the new tool, I just want to understand the
>> motivation
>>>> to not improve the existing sub-projects.
>>>> 
>>>> It's true that, for now, Ignite doesn't have "tool strategy"
>>>> I think it's a big issue from the user's point of view.
>>>> We should solve it in the nearest time.
>>>> Feel free to start this activity.
>>>> 
>>>>> - new ignite-profiling (which is a monitoring tool as well, judging
>> by
>>>> the provided link [1] )
>>>> 
>>>> The general idea is the following:
>>>> 
>>>> 1. We should have some profiling mechanism that will generate a
>> node-local
>>>> event log
>>>> 2. We should have a tool that can export events to some third-party
>>>> system. This system can be an Elastic Search(Kibana) or Ignite
>> performance
>>>> report or Kafka log, whatever.
>>>> 3. Ignite performance report, in the first release, should be a
>> "static"
>>>> tool.
>>>>    This means we take static logs(that is not rewritten in the
>> analysis
>>>> time) and feed them in the script(or tool or control.sh, whatever)
>>>>    The script produces static report that can be used for overall
>>>> performance analysis.
>>>> 
>>>> The primary users of this report is a developer of Ignite based
>>>> applications and performance engineers.
>>>> 
>>>> Ilya,
>>>> 
>>>>> You need to throw in control.sh also, which does some kind of
>> statistics
>>>> too, such as idle_verify.
>>>> 
>>>> Please, clarify your idea:
>>>>    We should use some info from control.sh to the report?
>>>>    The report should be generated by some control.sh subcommand?
>>>> 
>>>> 
>>>> Denis,
>>>> 
>>>>> Speaking of the probes/statistics collection approach, is it
>> supposed to
>>>> reuse tracing capabilities that are to be added as part of IEP-35?
>>>> 
>>>> For now, we don't have any results of tracing development available in
>>>> Apache Ignite.
>>>> Hopefully, we got some in a couple of weeks.
>>>> After it, we can start a discussion of how to merge two improvements.
>>>> 
>>>> 
>>>> 
>>>>> 24 апр. 2020 г., в 20:32, Denis Magda <dma...@apache.org>
>> написал(а):
>>>>> 
>>>>>> 
>>>>>> Tracing is more deeply takes statistics. If it will be possible,
>> I'm for
>>>>>> reuse.
>>>>> 
>>>>> 
>>>>> Looks like we need to sync up on these activities/initiatives to
>> ensure
>>>> we
>>>>> don't do a duplicate job. If you think a separate discussion is
>> necessary
>>>>> let's kick it off.
>>>>> 
>>>>> -
>>>>> Denis
>>>>> 
>>>>> 
>>>>> On Fri, Apr 24, 2020 at 9:18 AM Nikita Amelchev <
>> nsamelc...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Denis, Ilya,
>>>>>> 
>>>>>> I will try to integrate profiling functionality into control.sh
>> utility.
>>>>>> 
>>>>>>> Speaking of the probes/statistics collection approach, is it
>> supposed
>>>> to
>>>>>>> reuse tracing capabilities that are to be added as part of IEP-35?
>>>>>> Tracing is more deeply takes statistics. If it will be possible,
>> I'm for
>>>>>> reuse.
>>>>>> 
>>>>>> пт, 24 апр. 2020 г. в 18:59, Ilya Kasnacheev <
>> ilya.kasnach...@gmail.com
>>>>> :
>>>>>>> 
>>>>>>> Hello!
>>>>>>> 
>>>>>>> I suggest that it's one of the places where it could be put
>> instead of
>>>>>>> adding a new tool.
>>>>>>> 
>>>>>>> Regards,
>>>>>>> --
>>>>>>> Ilya Kasnacheev
>>>>>>> 
>>>>>>> 
>>>>>>> пт, 24 апр. 2020 г. в 18:56, Nikita Amelchev <nsamelc...@gmail.com
>>> :
>>>>>>> 
>>>>>>>> Ilya,
>>>>>>>> 
>>>>>>>> You suggest using control.sh to build the report?
>>>>>>>> 
>>>>>>>> пт, 24 апр. 2020 г. в 18:20, Ilya Kasnacheev <
>>>>>> ilya.kasnach...@gmail.com>:
>>>>>>>>> 
>>>>>>>>> Hello!
>>>>>>>>> 
>>>>>>>>> You need to throw in control.sh also, which does some kind of
>>>>>> statistics
>>>>>>>>> too, such as idle_verify.
>>>>>>>>> 
>>>>>>>>> Regards,
>>>>>>>>> --
>>>>>>>>> Ilya Kasnacheev
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> пт, 24 апр. 2020 г. в 18:06, Вячеслав Коптилин <
>>>>>> slava.kopti...@gmail.com
>>>>>>>>> :
>>>>>>>>> 
>>>>>>>>>> Hello Nikita,
>>>>>>>>>> 
>>>>>>>>>> Perhaps, I am missing something...
>>>>>>>>>> Apache Ignite already has a web-console tool. Do we want to
>>>>>> improve the
>>>>>>>>>> existing tool instead of creating a new one?
>>>>>>>>>> It seems to me, this can be confusing for users.
>>>>>>>>>> - visor (which is deprecated)
>>>>>>>>>> - web-console (to be honest, I don't quite understand the status
>>>>>> of
>>>>>>>> this
>>>>>>>>>> tool)
>>>>>>>>>> - new ignite-profiling (which is a monitoring tool as well,
>>>>>> judging
>>>>>>>> by the
>>>>>>>>>> provided link [1] )
>>>>>>>>>> 
>>>>>>>>>> I am not against the new tool, I just want to understand the
>>>>>>>> motivation to
>>>>>>>>>> not improve the existing sub-projects.
>>>>>>>>>> 
>>>>>>>>>> [1]
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/IGNITE/Cluster+performance+profiling+tool
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> S.
>>>>>>>>>> 
>>>>>>>>>> пт, 24 апр. 2020 г. в 14:58, Nikita Amelchev <
>> nsamelc...@gmail.com
>>>>>>> :
>>>>>>>>>> 
>>>>>>>>>>> Hi, Igniters.
>>>>>>>>>>> 
>>>>>>>>>>> I'm working on cluster profiling and the tool for creating a
>>>>>>>>>>> performance report. [1] I have prepared PoC based on
>> performance
>>>>>>>>>>> logging to a separate category of Ignite log. The report
>>>>>> contains:
>>>>>>>>>>> 
>>>>>>>>>>> - Cache operations and its distribution by types [2]
>>>>>>>>>>> - Transactions and histogram of durations [3]
>>>>>>>>>>> - SQL and Scan query statistics, top of slowest queries,
>> logical
>>>>>> and
>>>>>>>>>>> physical reads by query [4]
>>>>>>>>>>> - Compute statistics, top of slowest tasks and their jobs [5]
>>>>>>>>>>> Soon I will add:
>>>>>>>>>>> - Topology and Ignite versions info
>>>>>>>>>>> - Client ID in case of operations from clients
>>>>>>>>>>> 
>>>>>>>>>>> For now, I'm developing a binary logging format to reduce the
>>>>>> effect
>>>>>>>>>>> on performance. I'll try to reuse Ignite mechanisms.
>>>>>>>>>>> 
>>>>>>>>>>> I would like to hear suggestions for the profiling and the
>>>>>>>> performance
>>>>>>>>>>> report.
>>>>>>>>>>> 
>>>>>>>>>>> [1]
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/IGNITE/Cluster+performance+profiling+tool
>>>>>>>>>>> [2]
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/IGNITE/Cluster+performance+profiling+tool?preview=/145723859/148647581/p1.png
>>>>>>>>>>> [3]
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/IGNITE/Cluster+performance+profiling+tool?preview=/145723859/148647582/p2.png
>>>>>>>>>>> [4]
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/IGNITE/Cluster+performance+profiling+tool?preview=/145723859/148647583/p3.png
>>>>>>>>>>> [5]
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/IGNITE/Cluster+performance+profiling+tool?preview=/145723859/152112279/p5.png
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> Best wishes,
>>>>>>>>>>> Amelchev Nikita
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Best wishes,
>>>>>>>> Amelchev Nikita
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Best wishes,
>>>>>> Amelchev Nikita
>>>>>> 
>>>> 
>>>> 
>> 
>> 
>> 
>> --
>> Best wishes,
>> Amelchev Nikita
>> 

Reply via email to