[
https://issues.apache.org/jira/browse/IGNITE-28097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mikhail Petrov updated IGNITE-28097:
------------------------------------
Description:
Wee need to fix flaky CommunicationConnectionPoolMetricsTest see
https://ci2.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-8162810346300672703&tab=testDetails&branch_IgniteTests24Java8=__all_branches__
and tests with the same name but different parameters.
Steps that lead to test hanging:
1. The cluster consists of server nodes (crd, srv) and one client node (cli).
srv is the router for cli.
2. srv is stopped, and cli is attempting to reconnect to another cluster node
(see org.apache.ignite.spi.discovery.tcp.ClientImpl.Reconnector).
3. During the reconnection process, cli is stopped. However, due to incorrect
exception handling, cli simply opens a socket to crd.
{code:java}
org.apache.ignite.spi.IgniteSpiException: Wrong Ignite instance is set: null
[16:24:39]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.spi.IgniteSpiAdapter$GridDummySpiContext.addTimeoutObject(IgniteSpiAdapter.java:958)
[16:24:39]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.spi.IgniteSpiAdapter.addTimeoutObject(IgniteSpiAdapter.java:642)
[16:24:39]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.startTimer(TcpDiscoverySpi.java:2451)
[16:24:39]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.writeMessage(TcpDiscoverySpi.java:1756)
[16:24:39]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.spi.discovery.tcp.TestTcpDiscoverySpi.writeMessage(TestTcpDiscoverySpi.java:62)
[16:24:39]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.spi.discovery.tcp.ClientImpl.sendJoinRequest(ClientImpl.java:817)
[16:24:39]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.spi.discovery.tcp.ClientImpl.sendJoinRequests(ClientImpl.java:646)
[16:24:39]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.spi.discovery.tcp.ClientImpl.joinTopology(ClientImpl.java:608)
[16:24:39]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.spi.discovery.tcp.ClientImpl$Reconnector.body(ClientImpl.java:1601)
[16:24:39]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58)
{code}
4. cli cannot send a TcpDiscoveryNodeLeftMessage and crd considers cli is
reconnected and does not generate a NODE_LEFT event.
was:
Wee need to fix flaky CommunicationConnectionPoolMetricsTest see
https://ci2.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-8162810346300672703&tab=testDetails&branch_IgniteTests24Java8=__all_branches__
and tests with the same name but different parameters.
Steps that lead to test hanging:
1. The cluster consists of server nodes (crd, srv) and one client node (cli).
srv is the router for cli.
2. srv is stopped, and cli is attempting to reconnect to another cluster node
(see org.apache.ignite.spi.discovery.tcp.ClientImpl.Reconnector).
3. During the reconnection process, cli is stopped. However, due to incorrect
exception handling, cli simply opens a socket to crd.
{code:java}
org.apache.ignite.spi.IgniteSpiException: Wrong Ignite instance is set: null
[16:24:39]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.spi.IgniteSpiAdapter$GridDummySpiContext.addTimeoutObject(IgniteSpiAdapter.java:958)
[16:24:39]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.spi.IgniteSpiAdapter.addTimeoutObject(IgniteSpiAdapter.java:642)
[16:24:39]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.startTimer(TcpDiscoverySpi.java:2451)
[16:24:39]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.writeMessage(TcpDiscoverySpi.java:1756)
[16:24:39]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.spi.discovery.tcp.TestTcpDiscoverySpi.writeMessage(TestTcpDiscoverySpi.java:62)
[16:24:39]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.spi.discovery.tcp.ClientImpl.sendJoinRequest(ClientImpl.java:817)
[16:24:39]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.spi.discovery.tcp.ClientImpl.sendJoinRequests(ClientImpl.java:646)
[16:24:39]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.spi.discovery.tcp.ClientImpl.joinTopology(ClientImpl.java:608)
[16:24:39]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.spi.discovery.tcp.ClientImpl$Reconnector.body(ClientImpl.java:1601)
[16:24:39]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58)
{code}
But cannot also send a TcpDiscoveryNodeLeftMessage.
4. crd considers cli is reconnected and does not generate a NODE_LEFT event.
> Fixed unclosed socket if the client node stopped during a reconnect.
> --------------------------------------------------------------------
>
> Key: IGNITE-28097
> URL: https://issues.apache.org/jira/browse/IGNITE-28097
> Project: Ignite
> Issue Type: Bug
> Reporter: Mikhail Petrov
> Assignee: Mikhail Petrov
> Priority: Minor
> Labels: ise
> Fix For: 2.19
>
> Time Spent: 2h 50m
> Remaining Estimate: 0h
>
> Wee need to fix flaky CommunicationConnectionPoolMetricsTest see
> https://ci2.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-8162810346300672703&tab=testDetails&branch_IgniteTests24Java8=__all_branches__
> and tests with the same name but different parameters.
> Steps that lead to test hanging:
> 1. The cluster consists of server nodes (crd, srv) and one client node (cli).
> srv is the router for cli.
> 2. srv is stopped, and cli is attempting to reconnect to another cluster node
> (see org.apache.ignite.spi.discovery.tcp.ClientImpl.Reconnector).
> 3. During the reconnection process, cli is stopped. However, due to incorrect
> exception handling, cli simply opens a socket to crd.
> {code:java}
> org.apache.ignite.spi.IgniteSpiException: Wrong Ignite instance is set: null
> [16:24:39]W: [org.apache.ignite:ignite-core] at
> org.apache.ignite.spi.IgniteSpiAdapter$GridDummySpiContext.addTimeoutObject(IgniteSpiAdapter.java:958)
> [16:24:39]W: [org.apache.ignite:ignite-core] at
> org.apache.ignite.spi.IgniteSpiAdapter.addTimeoutObject(IgniteSpiAdapter.java:642)
> [16:24:39]W: [org.apache.ignite:ignite-core] at
> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.startTimer(TcpDiscoverySpi.java:2451)
> [16:24:39]W: [org.apache.ignite:ignite-core] at
> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.writeMessage(TcpDiscoverySpi.java:1756)
> [16:24:39]W: [org.apache.ignite:ignite-core] at
> org.apache.ignite.spi.discovery.tcp.TestTcpDiscoverySpi.writeMessage(TestTcpDiscoverySpi.java:62)
> [16:24:39]W: [org.apache.ignite:ignite-core] at
> org.apache.ignite.spi.discovery.tcp.ClientImpl.sendJoinRequest(ClientImpl.java:817)
> [16:24:39]W: [org.apache.ignite:ignite-core] at
> org.apache.ignite.spi.discovery.tcp.ClientImpl.sendJoinRequests(ClientImpl.java:646)
> [16:24:39]W: [org.apache.ignite:ignite-core] at
> org.apache.ignite.spi.discovery.tcp.ClientImpl.joinTopology(ClientImpl.java:608)
> [16:24:39]W: [org.apache.ignite:ignite-core] at
> org.apache.ignite.spi.discovery.tcp.ClientImpl$Reconnector.body(ClientImpl.java:1601)
> [16:24:39]W: [org.apache.ignite:ignite-core] at
> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58)
> {code}
> 4. cli cannot send a TcpDiscoveryNodeLeftMessage and crd considers cli is
> reconnected and does not generate a NODE_LEFT event.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)