[ https://issues.apache.org/jira/browse/IGNITE-19715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mikhail Petrov updated IGNITE-19715: ------------------------------------ Description: Thin client operations can take a long time if PA is enabled and some cluster nodes are not reachable over network. Consider the following scenario: 1. The thin client have already sucessfully established connection to all configured node addresses. 2. A particular cluster node becomes unreachable over network. It can be reproduced with iptables -A INPUT -p tcp --dport for Linux. 3. The thin client periodically sends put request which is mapped by PA to the unreachable node. 4. Firstly all attempts to perform put will lead to `ClientException: Timeout was reached before computation completed.` exception. But eventually the connection to the unreachable node will be closed by OS (see tcp_keepalive_time for Linux). This will lead to reestablishing connection to the unreachable node during handling of the next put (see ReliableChannel.java:1012) We currently do not set a timeout for the open connection operation (see GridNioClientConnectionMultiplexer#open, here we use Integer.MAX_VALUE for Socket#connect(java.net.SocketAddress, int)) As a result put operation hangs for a significant amount of time (it depends on OS parameters, usually it is couple of minutes) This is confusing for users because a single PUT takes much longer than the configured ClientConfiguration#setTimeout property. was: Thin client operations can take a long time if PA is enabled and some cluster nodes are not reachable over network. Consider the following scenario: 1. The thin client have already sucessfully established connection to all configured node addresses. 2. A particular cluster node becomes unreachable over network. It can be reproduced with iptables -A INPUT -p tcp --dport for Linux. 3. The thin client periodically sends put request which is mapped by PA to the unreachable node. 4. Firstly all attempts to perform put will lead to `ClientException: Timeout was reached before computation completed.` exception. But eventually the connection to the unreachable node will be closed by OS (see tcp_keepalive_time for Linux). This will lead to reestablishing connection to the unreachable node during handling of the next put (see ReliableChannel.java:1012) We currently do not set a timeout for the open connection operation (see GridNioClientConnectionMultiplexer#open, here we use Integer.MAX_VALUE for Socket#connect(java.net.SocketAddress, int)) As a result put operation hangs for a significant amount of time (it depends on OS parameters, usually it is couple of minutes) and ignores the ClientConfiguration#setTimeout property. > Thin client operations can take a long time if PA is enabled and some cluster > nodes are not network reachable. > -------------------------------------------------------------------------------------------------------------- > > Key: IGNITE-19715 > URL: https://issues.apache.org/jira/browse/IGNITE-19715 > Project: Ignite > Issue Type: Bug > Reporter: Mikhail Petrov > Priority: Major > > Thin client operations can take a long time if PA is enabled and some cluster > nodes are not reachable over network. > Consider the following scenario: > 1. The thin client have already sucessfully established connection to all > configured node addresses. > 2. A particular cluster node becomes unreachable over network. It can be > reproduced with iptables -A INPUT -p tcp --dport for Linux. > 3. The thin client periodically sends put request which is mapped by PA to > the unreachable node. > 4. Firstly all attempts to perform put will lead to `ClientException: > Timeout was reached before computation completed.` exception. But eventually > the connection to the unreachable node will be closed by OS (see > tcp_keepalive_time for Linux). > This will lead to reestablishing connection to the unreachable node during > handling of the next put (see ReliableChannel.java:1012) > We currently do not set a timeout for the open connection operation (see > GridNioClientConnectionMultiplexer#open, here we use Integer.MAX_VALUE for > Socket#connect(java.net.SocketAddress, int)) > As a result put operation hangs for a significant amount of time (it depends > on OS parameters, usually it is couple of minutes) This is confusing for > users because a single PUT takes much longer than the configured > ClientConfiguration#setTimeout property. -- This message was sent by Atlassian Jira (v8.20.10#820010)