[
https://issues.apache.org/jira/browse/IGNITE-10808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912282#comment-16912282
]
Denis Mekhanikov commented on IGNITE-10808:
-------------------------------------------
[~sergey-chugunov] , I reverted the changes forĀ
{{TcpDiscoveryClientAckResponse}}. Now this is the only high priority message.
If we decide to remove high priority discovery messages completely, then let's
do it in a different ticket.
Also some refactoring was performed. Could you take another look?
> Discovery message queue may build up with TcpDiscoveryMetricsUpdateMessage
> --------------------------------------------------------------------------
>
> Key: IGNITE-10808
> URL: https://issues.apache.org/jira/browse/IGNITE-10808
> Project: Ignite
> Issue Type: Bug
> Affects Versions: 2.7
> Reporter: Stanislav Lukyanov
> Assignee: Denis Mekhanikov
> Priority: Major
> Labels: discovery
> Fix For: 2.8
>
> Attachments: IgniteMetricsOverflowTest.java
>
>
> A node receives a new metrics update message every `metricsUpdateFrequency`
> milliseconds, and the message will be put at the top of the queue (because it
> is a high priority message).
> If processing one message takes more than `metricsUpdateFrequency` then
> multiple `TcpDiscoveryMetricsUpdateMessage` will be in the queue. A long
> enough delay (e.g. caused by a network glitch or GC) may lead to the queue
> building up tens of metrics update messages which are essentially useless to
> be processed. Finally, if processing a message on average takes a little more
> than `metricsUpdateFrequency` (even for a relatively short period of time,
> say, for a minute due to network issues) then the message worker will end up
> processing only the metrics updates and the cluster will essentially hang.
> Reproducer is attached. In the test, the queue first builds up and then very
> slowly being teared down, causing "Failed to wait for PME" messages.
> Need to change ServerImpl's SocketReader not to put another metrics update
> message to the top of the queue if it already has one (or replace the one at
> the top with new one).
--
This message was sent by Atlassian Jira
(v8.3.2#803003)