> If a cache has some percent of the relatively slow transaction this is a 
> trigger to make a deeper investigation.

It also will be visible on other metrics. So cache operations metrics
still useless because it transitive values.

>> 1. Measure some important internals (WAL operations, checkpoint time, etc) 
>> because it can talk about real problems.

> We already implement it.

I don't talk that it isn't implemented. It is just example of things
that should be measured. All other metrics depends on internals.

>> 2. Measure business operations in user context, not cache API operations.

>Why do you think these approaches should exclude one another?

Because one of them is useless.

On Fri, Dec 20, 2019 at 1:43 PM Николай Ижиков <nizhi...@apache.org> wrote:
>
> Hello, Andrey.
>
> > Where the sense in this value? I explained why this metrics are relatively 
> > useless.
>
> I don’t agree with you.
> I believe they are not useless for a user.
> And I try to explain why I think so.
>
> > But user can't distinguish one transaction from another, so his knowledge 
> > doesn't make sense definitely.
>
> Users shouldn’t distinguish.
> If a cache has some percent of the relatively slow transaction this is a 
> trigger to make a deeper investigation.
>
> > 1. Measure some important internals (WAL operations, checkpoint time, etc) 
> > because it can talk about real problems.
>
> We already implement it.
> What metrics are missing for internal processes?
>
> > 2. Measure business operations in user context, not cache API operations.
>
> Why do you think these approaches should exclude one another?
> Users definitely should measure whole business transaction performance.
>
> I think we should provide a way to measure part of the business transaction 
> that relates to the Ignite.
>
>
> > 20 дек. 2019 г., в 13:02, Andrey Gura <ag...@apache.org> написал(а):
> >
> >> The goal of the proposed metrics is to measure whole cache operations 
> >> behavior.
> >> It provides some kind of statistics(histograms) for it.
> >
> > Nikolay, reformulating doesn't make metrics more meaningful. Seriously :)
> >
> >> Yes, metrics will evaluate API call performance
> >
> > And what? Where the sense in this value? I explained why this metrics
> > are relatively useless.
> >
> >> These are metrics of client-side operation performance.
> >
> > Again. It's just a number without any sense.
> >
> >> I think a specific user has knowledge - what are his transactions.
> >
> > May be. But user can't distinguish one transaction from another, so
> > his knowledge doesn't make sense definitely.
> >
> >> From these metrics it can answer on the question «If my transaction 
> >> includes cacheXXX, how long it usually takes?»
> >
> > Actually not. The same caches can be involved  in a dozen of
> > transactions and there are no ways to understand what transactions are
> > slow or fast. It is useless.
> >
> >> I disagree here.
> >> If you have a better approach to measure cache operations performance - 
> >> please, share your vision.
> >
> > I already wrote about better approach. Two main points:
> >
> > 1. Measure some important internals (WAL operations, checkpoint time,
> > etc) because it can talk about real problems.
> > 2. Measure business operations in user context, not cache API operations.
> >
> > So  what we have? We have useless metrics that are doubled by useless
> > histograms.
> >
> > We should reconsider approach to metrics and performance measuring. It
> > is hard and long task. There are no need to commit tons of useless
> > metrics that just decrease performance.
> >
> > Sorry for some sarcasm but I really believe in my opinion. Metrics
> > problem exists very very long time and existing metrics discussed many
> > times. No one can explain this metrics to users because it requires
> > too many additional knowledge about internals. And metric  value
> > itself depends on many aspects of internals. It leads to impossibility
> > of interpretation. And it's good time to remove it (in AI 3.0 due to a
> > backward compatibility).
> >
> > On Thu, Dec 19, 2019 at 9:09 PM Николай Ижиков <nizhikov....@gmail.com> 
> > wrote:
> >>
> >> Hello, Andrey.
> >>
> >> The goal of the proposed metrics is to measure whole cache operations 
> >> behavior.
> >> It provides some kind of statistics(histograms) for it.
> >> For more fine-grained analysis one will be use tracing or other «go 
> >> deeper» tools.
> >>
> >>>> Measured for API calls on the caller node side
> >>> Values will the same only for cases when node is remote relative to data
> >>
> >> Yes, metrics will evaluate API call performance.
> >> I think this is the most valuable information from a user's point of view.
> >>
> >> Regular user wants to know how fast his cache operation performs.
> >> And these metrics provide the answer.
> >>
> >>> For regular data node (server node) timing will depend on answers for 
> >>> question:
> >>
> >> I think these answers are always available.
> >> I barely can imagine a scenario when one monitor «black box» cluster and 
> >> don’t know it.
> >> Even so, all answers are provided through system view we brought to the 
> >> Ignite :)
> >>
> >>> What is transaction commit or rollback time?
> >>
> >> These are metrics of client-side operation performance.
> >>
> >> I think a specific user has knowledge - what are his transactions.
> >> From these metrics it can answer on the question «If my transaction 
> >> includes cacheXXX, how long it usually takes?»
> >> I think it’s very valuable knowledge.
> >>
> >>> It will be implemented for most types of messages.
> >>
> >> Good, let’s do it?
> >>
> >>> So, from my point of view, commits for get/put/remove and commit/rollback 
> >>> should be reverted.
> >>
> >> I disagree here.
> >> If you have a better approach to measure cache operations performance - 
> >> please, share your vision.
> >>
> >>> 19 дек. 2019 г., в 16:03, Andrey Gura <ag...@apache.org> написал(а):
> >>>
> >>> From my point of view, Ignite should provide meaningful metrics for
> >>> internal components that could be useful for monitoring and analysis.
> >>> All suggested options are meaningless in a sense. Below I'll try
> >>> explain why.
> >>>
> >>>> * `get`, `put`, `remove` time histograms. Measured for API calls on the 
> >>>> caller node side.
> >>>>  Implemented in [1], commit [2].
> >>>
> >>> All cache operations in Ignite are distributed. So each value measured
> >>> for some cache operation will vary depending on where actually
> >>> operation is performed. Values will the same only for cases when node
> >>> is remote relative to data (e.g. client node).
> >>>
> >>> For regular data node (server node) timing will depend on answers for 
> >>> question:
> >>>
> >>> - is node primary for particular key or not? (for all operations)
> >>> - how many backups configured for the cache? (for put and remove)
> >>> - what write synchronization mode is configured for particular cache?
> >>> (for put and remove)
> >>> - is readFromBackup enabled for the cache? (for get)
> >>>
> >>> Both Ignite users and Ignite developers can't make any decision based
> >>> on this metrics.
> >>>
> >>>> * `commit`, `rollback` time histograms. Measured for API calls on the 
> >>>> caller node side [3].
> >>>
> >>> What is transaction commit or rollback time? How it calculates in
> >>> Ignite now? What actions included into transaction? What actions not
> >>> related with cache executed during transactions?
> >>>
> >>> There is no any sense in time of transaction commit or rollback
> >>> because there are no any way to understand what transaction was
> >>> performed in particular period of time. Usually a lot of transactions
> >>> and we can't to distinguish from each other.
> >>>
> >>> Moreover, transaction usually treats as business operation. So only
> >>> way to measure performance properly is measure business operation
> >>> time. That is user should create own metrics set for some business
> >>> API.
> >>>
> >>> Further. What about cross cache transactions? At the moment tx
> >>> commit/rollback time will be added to corresponding metrics per each
> >>> cache evolved to the transaction. The *same time* for *each cache*.
> >>> Absolutely meaningless.
> >>>
> >>> Again, both Ignite users and Ignite developers can't make any decision
> >>> based on this metrics. But users can create own metrics set.
> >>>
> >>>> * histograms that measure the time of processing `get`, `put`, `remove`, 
> >>>> `commit`, `rollback` messages on affinity nodes(primary and backups).
> >>>>  Ticket doesn't exist for it.
> >>>
> >>> It will be implemented for most types of messages.
> >>>
> >>> Metrics, application monitoring, performance analysis and measurement
> >>> are a a little harder than it sounds. Therefore, we must approach this
> >>> issue more carefully.
> >>> Blindly adding new types of metrics will not only not improve the
> >>> situation, but will also worsen the overall performance of the system
> >>> because metric calculation always on the hot path.
> >>>
> >>> So, from my point of view, commits for get/put/remove and
> >>> commit/rollback should be reverted.
> >>>
> >>> On Mon, Dec 16, 2019 at 5:39 PM Nikita Amelchev <nsamelc...@gmail.com> 
> >>> wrote:
> >>>>
> >>>> I think these metrics are useful.
> >>>>
> >>>> I have prepared PR [1] for commit and rollback histograms. [2]
> >>>> Nikolay, could you take a look, please?
> >>>>
> >>>> If you do not mind, I will try to add affinity-nodes cache metrics:
> >>>>>> * histograms that measure the time of processing `get`, `put`, 
> >>>>>> `remove`, `commit`, `rollback` messages on affinity nodes(primary and 
> >>>>>> backups). Ticket doesn't exist for it.
> >>>>
> >>>> I have filed a ticket for it. [3]
> >>>>
> >>>> [1] https://github.com/apache/ignite/pull/7141
> >>>> [2] https://issues.apache.org/jira/browse/IGNITE-12450
> >>>> [3] https://issues.apache.org/jira/browse/IGNITE-12453
> >>>>
> >>>> пн, 16 дек. 2019 г. в 11:07, Alexei Scherbakov 
> >>>> <alexey.scherbak...@gmail.com>:
> >>>>>
> >>>>> I think they are very useful.
> >>>>>
> >>>>> пн, 16 дек. 2019 г. в 10:51, Николай Ижиков <nizhi...@apache.org>:
> >>>>>
> >>>>>> Hello, Alexei.
> >>>>>>
> >>>>>> Thanks for the link on the ticket, lableled it with the IEP-35 label.
> >>>>>> What do you think about proposed metrics set?
> >>>>>>
> >>>>>>> 16 дек. 2019 г., в 10:29, Alexei Scherbakov <
> >>>>>> alexey.scherbak...@gmail.com> написал(а):
> >>>>>>>
> >>>>>>> Nikolay,
> >>>>>>>
> >>>>>>> What about batch operations?
> >>>>>>>
> >>>>>>> For messages processing the ticket does exist and even has an
> >>>>>>> implementation from before new metrics API times [1]
> >>>>>>>
> >>>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-10418
> >>>>>>>
> >>>>>>> пн, 16 дек. 2019 г. в 10:12, Николай Ижиков <nizhi...@apache.org>:
> >>>>>>>
> >>>>>>>> Hello, Igniters.
> >>>>>>>>
> >>>>>>>> I want to provide the user answers to the following question: "How 
> >>>>>>>> cache
> >>>>>>>> API operations perform?"
> >>>>>>>> It seems, we need to implements metrics for basic cache API 
> >>>>>>>> operations
> >>>>>>>> like get, put, remove for it.
> >>>>>>>>
> >>>>>>>> I think we should provide the following metrics:
> >>>>>>>>
> >>>>>>>> * `get`, `put`, `remove` time histograms. Measured for API calls on 
> >>>>>>>> the
> >>>>>>>> caller node side.
> >>>>>>>>  Implemented in [1], commit [2].
> >>>>>>>>
> >>>>>>>> * `commit`, `rollback` time histograms. Measured for API calls on the
> >>>>>>>> caller node side [3].
> >>>>>>>>
> >>>>>>>> * histograms that measure the time of processing `get`, `put`, 
> >>>>>>>> `remove`,
> >>>>>>>> `commit`, `rollback` messages on affinity nodes(primary and backups).
> >>>>>>>>  Ticket doesn't exist for it.
> >>>>>>>>
> >>>>>>>> What do you think?
> >>>>>>>>
> >>>>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-12219
> >>>>>>>> [2]
> >>>>>>>>
> >>>>>> https://github.com/apache/ignite/commit/e66bbef97b2cef73a533ce8a506ec479852cb364
> >>>>>>>> [3] https://issues.apache.org/jira/browse/IGNITE-12450
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>>
> >>>>>>> Best regards,
> >>>>>>> Alexei Scherbakov
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Best regards,
> >>>>> Alexei Scherbakov
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Best wishes,
> >>>> Amelchev Nikita
> >>
>

Reply via email to