Guys, What if we calculate it on both sides? The client will keep the total time needed to complete an operation including network hoops while a server (primary or backup) will count only local time.
— Denis > On Jul 17, 2017, at 7:07 AM, Andrey Gura <ag...@apache.org> wrote: > > Hi, > > I believe that the first solution is better than second because it > takes into account network communication time. Average time of > communication between nodes doesn't make sense from my point of view. > > So I vote for #1. > > On Thu, Jul 13, 2017 at 11:52 PM, Вячеслав Коптилин > <slava.kopti...@gmail.com> wrote: >> Hi Experts, >> >> I am working on https://issues.apache.org/jira/browse/IGNITE-3495 >> >> A few words about this issue: >> It is about that the process of gathering/updating of cache metrics is >> inconsistent in some cases. >> Let's consider the following simple topology which contains only two nodes: >> first node is a client node and the second is a server. >> And client node starts requests to the server node, for instance >> cache.put(), cache.putAll(), cache.get() etc. >> In that case, metrics which are related to counters (cache hits, cache >> misses, removals and puts) are calculated on the server side, >> while time metrics are updated on the client node. >> >> I think that both metrics (counters and time) should be calculated on the >> same node. So, there are two obvious solution: >> >> #1 Node that starts some operation is responsible for updating the cache >> metrics. >> Pro: >> - it will allow to get more accurate results of metrics. >> Contra: >> - this approach does not work in particular cases. for example, partitioned >> cache with FULL_ASYNC write synchronization mode. >> - needs to extend response messages (GridNearAtomicUpdateResponse, >> GridNearGetResponse etc) >> in order to provide additional information from remote node: cache hits, >> number of removal etc. >> So, it will lead to additional pressure on communication channel. >> Perhaps, this impact will be small - 4 bytes per message or something like >> that. >> - backward incompatibility (this is a consequence of the previous point) >> >> #2 Primary node (node that actually executes a request) >> Pro: >> - easy to implement >> - backward compatible >> Contra: >> - time metrics will not include the time of communication between nodes, so >> the results will be less accurate. >> - perhaps we need to provide additional metric which will allow to get avg >> time of communication between nodes. >> >> Please let me know about your thoughts. >> Perhaps, both alternatives are not so good... >> >> Regards, >> Slava.