> If a cache has some percent of the relatively slow transaction this is a > trigger to make a deeper investigation.
It also will be visible on other metrics. So cache operations metrics still useless because it transitive values. >> 1. Measure some important internals (WAL operations, checkpoint time, etc) >> because it can talk about real problems. > We already implement it. I don't talk that it isn't implemented. It is just example of things that should be measured. All other metrics depends on internals. >> 2. Measure business operations in user context, not cache API operations. >Why do you think these approaches should exclude one another? Because one of them is useless. On Fri, Dec 20, 2019 at 1:43 PM Николай Ижиков <nizhi...@apache.org> wrote: > > Hello, Andrey. > > > Where the sense in this value? I explained why this metrics are relatively > > useless. > > I don’t agree with you. > I believe they are not useless for a user. > And I try to explain why I think so. > > > But user can't distinguish one transaction from another, so his knowledge > > doesn't make sense definitely. > > Users shouldn’t distinguish. > If a cache has some percent of the relatively slow transaction this is a > trigger to make a deeper investigation. > > > 1. Measure some important internals (WAL operations, checkpoint time, etc) > > because it can talk about real problems. > > We already implement it. > What metrics are missing for internal processes? > > > 2. Measure business operations in user context, not cache API operations. > > Why do you think these approaches should exclude one another? > Users definitely should measure whole business transaction performance. > > I think we should provide a way to measure part of the business transaction > that relates to the Ignite. > > > > 20 дек. 2019 г., в 13:02, Andrey Gura <ag...@apache.org> написал(а): > > > >> The goal of the proposed metrics is to measure whole cache operations > >> behavior. > >> It provides some kind of statistics(histograms) for it. > > > > Nikolay, reformulating doesn't make metrics more meaningful. Seriously :) > > > >> Yes, metrics will evaluate API call performance > > > > And what? Where the sense in this value? I explained why this metrics > > are relatively useless. > > > >> These are metrics of client-side operation performance. > > > > Again. It's just a number without any sense. > > > >> I think a specific user has knowledge - what are his transactions. > > > > May be. But user can't distinguish one transaction from another, so > > his knowledge doesn't make sense definitely. > > > >> From these metrics it can answer on the question «If my transaction > >> includes cacheXXX, how long it usually takes?» > > > > Actually not. The same caches can be involved in a dozen of > > transactions and there are no ways to understand what transactions are > > slow or fast. It is useless. > > > >> I disagree here. > >> If you have a better approach to measure cache operations performance - > >> please, share your vision. > > > > I already wrote about better approach. Two main points: > > > > 1. Measure some important internals (WAL operations, checkpoint time, > > etc) because it can talk about real problems. > > 2. Measure business operations in user context, not cache API operations. > > > > So what we have? We have useless metrics that are doubled by useless > > histograms. > > > > We should reconsider approach to metrics and performance measuring. It > > is hard and long task. There are no need to commit tons of useless > > metrics that just decrease performance. > > > > Sorry for some sarcasm but I really believe in my opinion. Metrics > > problem exists very very long time and existing metrics discussed many > > times. No one can explain this metrics to users because it requires > > too many additional knowledge about internals. And metric value > > itself depends on many aspects of internals. It leads to impossibility > > of interpretation. And it's good time to remove it (in AI 3.0 due to a > > backward compatibility). > > > > On Thu, Dec 19, 2019 at 9:09 PM Николай Ижиков <nizhikov....@gmail.com> > > wrote: > >> > >> Hello, Andrey. > >> > >> The goal of the proposed metrics is to measure whole cache operations > >> behavior. > >> It provides some kind of statistics(histograms) for it. > >> For more fine-grained analysis one will be use tracing or other «go > >> deeper» tools. > >> > >>>> Measured for API calls on the caller node side > >>> Values will the same only for cases when node is remote relative to data > >> > >> Yes, metrics will evaluate API call performance. > >> I think this is the most valuable information from a user's point of view. > >> > >> Regular user wants to know how fast his cache operation performs. > >> And these metrics provide the answer. > >> > >>> For regular data node (server node) timing will depend on answers for > >>> question: > >> > >> I think these answers are always available. > >> I barely can imagine a scenario when one monitor «black box» cluster and > >> don’t know it. > >> Even so, all answers are provided through system view we brought to the > >> Ignite :) > >> > >>> What is transaction commit or rollback time? > >> > >> These are metrics of client-side operation performance. > >> > >> I think a specific user has knowledge - what are his transactions. > >> From these metrics it can answer on the question «If my transaction > >> includes cacheXXX, how long it usually takes?» > >> I think it’s very valuable knowledge. > >> > >>> It will be implemented for most types of messages. > >> > >> Good, let’s do it? > >> > >>> So, from my point of view, commits for get/put/remove and commit/rollback > >>> should be reverted. > >> > >> I disagree here. > >> If you have a better approach to measure cache operations performance - > >> please, share your vision. > >> > >>> 19 дек. 2019 г., в 16:03, Andrey Gura <ag...@apache.org> написал(а): > >>> > >>> From my point of view, Ignite should provide meaningful metrics for > >>> internal components that could be useful for monitoring and analysis. > >>> All suggested options are meaningless in a sense. Below I'll try > >>> explain why. > >>> > >>>> * `get`, `put`, `remove` time histograms. Measured for API calls on the > >>>> caller node side. > >>>> Implemented in [1], commit [2]. > >>> > >>> All cache operations in Ignite are distributed. So each value measured > >>> for some cache operation will vary depending on where actually > >>> operation is performed. Values will the same only for cases when node > >>> is remote relative to data (e.g. client node). > >>> > >>> For regular data node (server node) timing will depend on answers for > >>> question: > >>> > >>> - is node primary for particular key or not? (for all operations) > >>> - how many backups configured for the cache? (for put and remove) > >>> - what write synchronization mode is configured for particular cache? > >>> (for put and remove) > >>> - is readFromBackup enabled for the cache? (for get) > >>> > >>> Both Ignite users and Ignite developers can't make any decision based > >>> on this metrics. > >>> > >>>> * `commit`, `rollback` time histograms. Measured for API calls on the > >>>> caller node side [3]. > >>> > >>> What is transaction commit or rollback time? How it calculates in > >>> Ignite now? What actions included into transaction? What actions not > >>> related with cache executed during transactions? > >>> > >>> There is no any sense in time of transaction commit or rollback > >>> because there are no any way to understand what transaction was > >>> performed in particular period of time. Usually a lot of transactions > >>> and we can't to distinguish from each other. > >>> > >>> Moreover, transaction usually treats as business operation. So only > >>> way to measure performance properly is measure business operation > >>> time. That is user should create own metrics set for some business > >>> API. > >>> > >>> Further. What about cross cache transactions? At the moment tx > >>> commit/rollback time will be added to corresponding metrics per each > >>> cache evolved to the transaction. The *same time* for *each cache*. > >>> Absolutely meaningless. > >>> > >>> Again, both Ignite users and Ignite developers can't make any decision > >>> based on this metrics. But users can create own metrics set. > >>> > >>>> * histograms that measure the time of processing `get`, `put`, `remove`, > >>>> `commit`, `rollback` messages on affinity nodes(primary and backups). > >>>> Ticket doesn't exist for it. > >>> > >>> It will be implemented for most types of messages. > >>> > >>> Metrics, application monitoring, performance analysis and measurement > >>> are a a little harder than it sounds. Therefore, we must approach this > >>> issue more carefully. > >>> Blindly adding new types of metrics will not only not improve the > >>> situation, but will also worsen the overall performance of the system > >>> because metric calculation always on the hot path. > >>> > >>> So, from my point of view, commits for get/put/remove and > >>> commit/rollback should be reverted. > >>> > >>> On Mon, Dec 16, 2019 at 5:39 PM Nikita Amelchev <nsamelc...@gmail.com> > >>> wrote: > >>>> > >>>> I think these metrics are useful. > >>>> > >>>> I have prepared PR [1] for commit and rollback histograms. [2] > >>>> Nikolay, could you take a look, please? > >>>> > >>>> If you do not mind, I will try to add affinity-nodes cache metrics: > >>>>>> * histograms that measure the time of processing `get`, `put`, > >>>>>> `remove`, `commit`, `rollback` messages on affinity nodes(primary and > >>>>>> backups). Ticket doesn't exist for it. > >>>> > >>>> I have filed a ticket for it. [3] > >>>> > >>>> [1] https://github.com/apache/ignite/pull/7141 > >>>> [2] https://issues.apache.org/jira/browse/IGNITE-12450 > >>>> [3] https://issues.apache.org/jira/browse/IGNITE-12453 > >>>> > >>>> пн, 16 дек. 2019 г. в 11:07, Alexei Scherbakov > >>>> <alexey.scherbak...@gmail.com>: > >>>>> > >>>>> I think they are very useful. > >>>>> > >>>>> пн, 16 дек. 2019 г. в 10:51, Николай Ижиков <nizhi...@apache.org>: > >>>>> > >>>>>> Hello, Alexei. > >>>>>> > >>>>>> Thanks for the link on the ticket, lableled it with the IEP-35 label. > >>>>>> What do you think about proposed metrics set? > >>>>>> > >>>>>>> 16 дек. 2019 г., в 10:29, Alexei Scherbakov < > >>>>>> alexey.scherbak...@gmail.com> написал(а): > >>>>>>> > >>>>>>> Nikolay, > >>>>>>> > >>>>>>> What about batch operations? > >>>>>>> > >>>>>>> For messages processing the ticket does exist and even has an > >>>>>>> implementation from before new metrics API times [1] > >>>>>>> > >>>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-10418 > >>>>>>> > >>>>>>> пн, 16 дек. 2019 г. в 10:12, Николай Ижиков <nizhi...@apache.org>: > >>>>>>> > >>>>>>>> Hello, Igniters. > >>>>>>>> > >>>>>>>> I want to provide the user answers to the following question: "How > >>>>>>>> cache > >>>>>>>> API operations perform?" > >>>>>>>> It seems, we need to implements metrics for basic cache API > >>>>>>>> operations > >>>>>>>> like get, put, remove for it. > >>>>>>>> > >>>>>>>> I think we should provide the following metrics: > >>>>>>>> > >>>>>>>> * `get`, `put`, `remove` time histograms. Measured for API calls on > >>>>>>>> the > >>>>>>>> caller node side. > >>>>>>>> Implemented in [1], commit [2]. > >>>>>>>> > >>>>>>>> * `commit`, `rollback` time histograms. Measured for API calls on the > >>>>>>>> caller node side [3]. > >>>>>>>> > >>>>>>>> * histograms that measure the time of processing `get`, `put`, > >>>>>>>> `remove`, > >>>>>>>> `commit`, `rollback` messages on affinity nodes(primary and backups). > >>>>>>>> Ticket doesn't exist for it. > >>>>>>>> > >>>>>>>> What do you think? > >>>>>>>> > >>>>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-12219 > >>>>>>>> [2] > >>>>>>>> > >>>>>> https://github.com/apache/ignite/commit/e66bbef97b2cef73a533ce8a506ec479852cb364 > >>>>>>>> [3] https://issues.apache.org/jira/browse/IGNITE-12450 > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> > >>>>>>> Best regards, > >>>>>>> Alexei Scherbakov > >>>>>> > >>>>>> > >>>>> > >>>>> -- > >>>>> > >>>>> Best regards, > >>>>> Alexei Scherbakov > >>>> > >>>> > >>>> > >>>> -- > >>>> Best wishes, > >>>> Amelchev Nikita > >> >