Hello, Alex. I will fix it shortly.
Ticket - https://issues.apache.org/jira/browse/IGNITE-11945 В Пт, 28/06/2019 в 19:04 +0300, Alexey Goncharuk пишет: > Hello Nikolay, > > From the latest TC runs I can see a sporadic regression in > testParallelStartAndStop and testStartManyCaches tests. Digging deeper, I see > that stopping a single cache when there are many other started caches now > takes a significant amount of time. > > Current suspect is GridCacheAdapter#stop method, which iterates over the > cache metrics and removes them. The issue is that the > MetricsRegistry.withPrefix().getMetrics().forEach() internally uses a > filtered view of all the metrics will effectively iterate over all existing > metrics in the system, which makes a sequential stop of N caches an O(N^2) > complexity. > > I think we can either make the metrics registry to utilize a SkipListMap, > which allow us to iterate only on a subset of the metrics by prefix, or > internally have a trie so that we can remove all metrics with a given prefix > in O(1) time. > > What do you think? > > пн, 10 июн. 2019 г. в 13:49, Nikolay Izhikov <nizhi...@apache.org>: > > Hello, Igniters. > > > > Since Phase 1 will be merged in master soon I've created the ticket [1] for > > Phase 2. > > > > Scope of Phase 2(copy-paste from the ticket) > > > > Ability to collect lists of some internal object Ignite manage. > > Examples of such objects: > > > > * Caches > > * Queries (including continuous queries) > > * Services > > * Compute tasks > > * Distributed Data Structures > > * etc... > > > > > > 1. Fields for each list(that doesn't currently exists in Ignite) will be > > discussed in separate tickets > > 2. Metric Exporters (optionally) can support list export. > > > > [1] https://issues.apache.org/jira/browse/IGNITE-11905 > > > > > > В Вт, 14/05/2019 в 16:42 +0300, Nikolay Izhikov пишет: > > > Ticket for IEP.Phase1 created - > > > https://issues.apache.org/jira/browse/IGNITE-11848 > > > > > > > > > В Пн, 13/05/2019 в 18:06 +0300, Nikolay Izhikov пишет: > > > > Hello, Igniters. > > > > > > > > We have discussed this IEP [1] with Alexey Goncharyuk, Anton > > > > Vinogradov, Andrey Gura, Alexey Scherbakov and Pavel Kovalenko. > > > > > > > > Issues to address: > > > > > > > > 1. Study experience of following libs, tools: > > > > * OpenTracing > > > > * OpenSensus > > > > * DropWizard > > > > > > > > 2. Support histogram sensor: Sensor that collects values that gets into > > > > predefined segments > > > > > > > > 3. Use more widely used naming(like in OpenSensus?) > > > > > > > > 4. Consider the usage of OpenSensus as a default implementation for > > > > local metric storage. > > > > > > > > 5. To measure the performance penalty for metrics for 5_000 caches. > > > > > > > > 6. Some metrics should be part of public API and others are not(may be > > > > changed/removed in release without warnings). > > > > > > > > My plan for Phase #1 is the following: > > > > > > > > 1. Address the issues. > > > > 2. Prepare public API > > > > 3. Prepare PR for monitoring subsystem + existing metrics rewritten > > > > with it. > > > > 4. Prepare a PR with lists of each user API. > > > > 5. Collect feedback for a #4. > > > > 6. Design a log exposer. Consider the usage of JFR format or some other > > > > widely used, tool compatible format. > > > > > > > > [1] > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=112820392 > > > > > > > > В Чт, 02/05/2019 в 14:02 +0300, Nikolay Izhikov пишет: > > > > > Hello, Maxim. > > > > > > > > > > > How will be recorded throughput sensor values which will require an > > > > > > interval for the rate calculations? > > > > > > > > > > I answered to this question in IEP "Design principles": > > > > > > > > > > ``` > > > > > Sensors should contain only raw values. No aggregation of numeric > > > > > metrics on Ignite side. > > > > > Min, max, avg and other functions are the matter of an external > > > > > monitoring system. > > > > > ``` > > > > > > > > > > Throughput is a function `(S(t2) - S(t1))/(t2-t1)` > > > > > where S(t) is the sensor value in some point of time t. > > > > > > > > > > Seems, throughput calculation is a responsibility of an external > > > > > system. > > > > > > > > > > What do you think? > > > > > > > > > > > It seems to me that we can add an additional parameter of > > > > > > `sensitivityLevel` to provide for the user a flexible sensor > > > > > > control (e.g., INFO, WARN, NOTICE, DEBUG). > > > > > > > > > > For now, I think that all sensors and lists will be very(very!) > > > > > lightweight. > > > > > So, we should be able to disable/enable it's, for sure. > > > > > > > > > > But, we should turn off and turn on the whole Ignite subsystem > > > > > for the case we have strong performance limitations for a particular > > > > > workload. > > > > > > > > > > So, we have two "level" of monitoring - INFO and DEBUG(for profiling: > > > > > IEP-35 - Phase 3). > > > > > For example, AFAIK we can't disable current SQL system views(Why > > > > > should we?) > > > > > > > > > > В Вт, 30/04/2019 в 14:33 +0300, Maxim Muzafarov пишет: > > > > > > Hello Nikolay, > > > > > > > > > > > > I've looked through your PRs changes. > > > > > > > > > > > > > Sensors > > > > > > > > > > > > How will be recorded throughput sensor values which will require an > > > > > > interval for the rate calculations? Do we have such an example? For > > > > > > instance, getAllocationRate() or getEvictionRate(). These metrics > > > > > > are > > > > > > out of the scope of current PoC and IEP as they are not related to > > > > > > the > > > > > > user metrics, but it is a good example of a particular metric type. > > > > > > > > > > > > It seems to me that we can add an additional parameter of > > > > > > `sensitivityLevel` to provide for the user a flexible sensor control > > > > > > (e.g., INFO, WARN, NOTICE, DEBUG). > > > > > > > > > > > > It also seems that for the sensors getValue() the completely > > > > > > functional java approach can be used. Am I right? > > > > > > > > > > > > On Mon, 29 Apr 2019 at 11:44, Nikolay Izhikov <nizhi...@apache.org> > > > > > > wrote: > > > > > > > > > > > > > > Hello, Vyacheslav. > > > > > > > > > > > > > > Thanks for the feedback! > > > > > > > > > > > > > > > HttpExposer with Jetty's dependencies should be detached> from > > > > > > > > the core module. > > > > > > > > > > > > > > Agreed. module hierarchy is the essence of the next steps. > > > > > > > For now it just a proof of my ideas for Ignite monitoring we can > > > > > > > discuss. > > > > > > > > > > > > > > > I like your approach with 'wrapper' for monitored objects, like > > > > > > > > don't like using 'ServiceConfiguration' directly as a monitored > > > > > > > > object for services > > > > > > > > > > > > > > Agreed in general. > > > > > > > Seems, choosing the right data to expose is the matter of > > > > > > > separate discussion for each Ignite entities. > > > > > > > I've planned to file tickets for each entity so anyone interested > > > > > > > can share his vision in it. > > > > > > > > > > > > > > > In my opinion, each sensor should have a timestamp. > > > > > > > > > > > > > > I'm not sure that *every* sensor should have directly associated > > > > > > > timestamp. > > > > > > > Seems, we should support sensors without timestamp for a current > > > > > > > monitoring numbers at least. > > > > > > > > > > > > > > > Also, it'd be great to have an ability to store a list of a > > > > > > > > fixed size> of last N sensors > > > > > > > > > > > > > > What use-cases do you know for such sensors? > > > > > > > We have plans to support fixed size lists to show "Last N SQL > > > > > > > queries" or similar data. > > > > > > > Essentially, a sensor is just a single value with the name and > > > > > > > known meaning. > > > > > > > > > > > > > > > It'd be great if you provide a more extended test to show the > > > > > > > > work of> the system. > > > > > > > > > > > > > > Sorry, for that :) > > > > > > > When you run 'MonitoringSelfTest' you should open > > > > > > > http://localhost:8080/ignite/monitoring to view exposed info. > > > > > > > I provide this info in gist - > > > > > > > https://gist.github.com/nizhikov/aa1e6222e6a3456472b881b8deb0e24d > > > > > > > > > > > > > > I will extend this test to print results to console in the next > > > > > > > iterations - stay tuned :) > > > > > > > > > > > > > > В Вс, 28/04/2019 в 23:35 +0300, Vyacheslav Daradur пишет: > > > > > > > > Hi, Nikolay, > > > > > > > > > > > > > > > > I looked through PR and IEP, and I have some comments: > > > > > > > > > > > > > > > > It would be better to implement it as a separate module, I > > > > > > > > can't say > > > > > > > > if it is possible for the main part of monitoring or not, but I > > > > > > > > believe that HttpExposer with Jetty's dependencies should be > > > > > > > > detached > > > > > > > > from the core module. > > > > > > > > > > > > > > > > I like your approach with 'wrapper' for monitored objects, like > > > > > > > > 'ComputeTaskInfo' in PR, and don't like using > > > > > > > > 'ServiceConfiguration' > > > > > > > > directly as a monitored object for services. I believe we > > > > > > > > shouldn't > > > > > > > > mix approaches. It'd be better always use some kind of > > > > > > > > container with > > > > > > > > monitored object's information to work with such data. > > > > > > > > > > > > > > > > In my opinion, each sensor should have a timestamp. Usually > > > > > > > > monitoring > > > > > > > > systems aggregate data and build graphics according to sensors > > > > > > > > timestamp. > > > > > > > > > > > > > > > > Also, it'd be great to have an ability to store a list of a > > > > > > > > fixed size > > > > > > > > of last N sensors, not to miss them without pushing to an > > > > > > > > external > > > > > > > > monitoring system. > > > > > > > > > > > > > > > > It'd be great if you provide a more extended test to show the > > > > > > > > work of > > > > > > > > the system. Everybody who looks to PR needs to run the test and > > > > > > > > get > > > > > > > > the info manually to see the completeness of sensors, this > > > > > > > > might be > > > > > > > > simplified by proper test. > > > > > > > > > > > > > > > > Thank you! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Apr 26, 2019 at 5:56 PM Nikolay Izhikov > > > > > > > > <nizhi...@apache.org> wrote: > > > > > > > > > > > > > > > > > > Hello, Igniters. > > > > > > > > > > > > > > > > > > I've prepared Proof of Concept for IEP-35 [1] > > > > > > > > > PR can be found here - > > > > > > > > > https://github.com/apache/ignite/pull/6510 > > > > > > > > > > > > > > > > > > I've done following changes: > > > > > > > > > > > > > > > > > > 1. `GridMonitoringManager` [2] - simple > > > > > > > > > implementation of manager to store all monitoring info > > > > > > > > > 2. `HttpPullExposerSpi` [3] - pull exposer > > > > > > > > > implementation that can respond with JSON from > > > > > > > > > http://localhost:8080/ignite/monitoring. JSON content can be > > > > > > > > > veiwed in gist [4] > > > > > > > > > 3. Compute task start and finish monitoring in > > > > > > > > > "compute" list [5] > > > > > > > > > 4. Service registration are monitored in "service" > > > > > > > > > list - [6] > > > > > > > > > 5. Current `IgniteSpiMBeanAdapter` rewritten using > > > > > > > > > `GridMonitoringManager` [7] > > > > > > > > > > > > > > > > > > Design principles, monitoring subsystem details and new > > > > > > > > > Ignite entities can be found in IEP [1]. > > > > > > > > > > > > > > > > > > My next steps will be: > > > > > > > > > > > > > > > > > > 1. Implementation of JMX exposer > > > > > > > > > 2. Registration of all "lists" and "sensor groups" as > > > > > > > > > a SQL System view. > > > > > > > > > 3. Add monitoring for all unmonitoring Ignite API. > > > > > > > > > (described in IEP). > > > > > > > > > 4. Rewrite existing jmx metrics using > > > > > > > > > GridMonitoringManager. > > > > > > > > > > > > > > > > > > Please, share you thoughts. > > > > > > > > > > > > > > > > > > Part of JSON file: > > > > > > > > > ``` > > > > > > > > > "COMPUTE": { > > > > > > > > > "tasks": { > > > > > > > > > "name": "tasks", > > > > > > > > > "rows": [ > > > > > > > > > { > > > > > > > > > "id": "0798817a-eeec-4386-9af7-94edb39ffced", > > > > > > > > > "sessionId": > > > > > > > > > "a1814f95a61-912451ff-ca7b-4764-a7fd-728f6a900000", > > > > > > > > > "data": { > > > > > > > > > "taskClasName": > > > > > > > > > "org.apache.ignite.monitoring.MonitoringSelfTest$$Lambda$145/1500885480", > > > > > > > > > "startTime": 1556287337944, > > > > > > > > > "timeout": 9223372036854776000, > > > > > > > > > "execName": null > > > > > > > > > }, > > > > > > > > > "name": "anotherBroadcast" > > > > > > > > > } > > > > > > > > > ``` > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=112820392 > > > > > > > > > [2] > > > > > > > > > https://github.com/apache/ignite/pull/6510/files#diff-ec7d5cf5e35b99303deb9accee153c50R34 > > > > > > > > > [3] > > > > > > > > > https://github.com/apache/ignite/pull/6510/files#diff-32239c45e0ae3b692af2eae7078e1436R47 > > > > > > > > > [4] > > > > > > > > > https://gist.github.com/nizhikov/aa1e6222e6a3456472b881b8deb0e24d > > > > > > > > > [5] > > > > > > > > > https://github.com/apache/ignite/pull/6510/files#diff-d651ed29d07bd0c5ce291654a3254cc0R749 > > > > > > > > > [6] > > > > > > > > > https://github.com/apache/ignite/pull/6510/files#diff-0b4e54fbda2b0da1c10eff48416336f6R1606 > > > > > > > > > [7] > > > > > > > > > https://github.com/apache/ignite/pull/6510/files#diff-4398bf118150500e059069b3a1638ec7R61 > > > > > > > > > > > > > > > > > > > > > > > >
signature.asc
Description: This is a digitally signed message part