Cache Metrics

Вячеслав Коптилин Thu, 13 Jul 2017 13:54:00 -0700

Hi Experts,

I am working on https://issues.apache.org/jira/browse/IGNITE-3495


A few words about this issue:
It is about that the process of gathering/updating of cache metrics is
inconsistent in some cases.
Let's consider the following simple topology which contains only two nodes:
first node is a client node and the second is a server.
And client node starts requests to the server node, for instance
cache.put(), cache.putAll(), cache.get() etc.
In that case, metrics which are related to counters (cache hits, cache
misses, removals and puts) are calculated on the server side,
while time metrics are updated on the client node.

I think that both metrics (counters and time) should be calculated on the
same node. So, there are two obvious solution:

#1 Node that starts some operation is responsible for updating the cache
metrics.
Pro:
 - it will allow to get more accurate results of metrics.
Contra:
- this approach does not work in particular cases. for example, partitioned
cache with FULL_ASYNC write synchronization mode.
- needs to extend response messages (GridNearAtomicUpdateResponse,
GridNearGetResponse etc)
  in order to provide additional information from remote node: cache hits,
number of removal etc.
  So, it will lead to additional pressure on communication channel.
Perhaps, this impact will be small - 4 bytes per message or something like
that.
- backward incompatibility (this is a consequence of the previous point)

#2 Primary node (node that actually executes a request)
Pro:
- easy to implement
- backward compatible
Contra:
- time metrics will not include the time of communication between nodes, so
the results will be less accurate.
- perhaps we need to provide additional metric which will allow to get avg
time of communication between nodes.

Please let me know about your thoughts.
Perhaps, both alternatives are not so good...

Regards,
Slava.

Cache Metrics

Reply via email to