[ 
https://issues.apache.org/jira/browse/IGNITE-19715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732532#comment-17732532
 ] 

Ignite TC Bot commented on IGNITE-19715:
----------------------------------------

{panel:title=Branch: [pull/10770/head] Base: [master] : No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
{panel:title=Branch: [pull/10770/head] Base: [master] : No new tests 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1}{panel}
[TeamCity *--> Run :: All* 
Results|https://ci2.ignite.apache.org/viewLog.html?buildId=7216123&buildTypeId=IgniteTests24Java8_RunAll]

> Thin client operations can take a long time if PA is enabled and some cluster 
> nodes are not network reachable.
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-19715
>                 URL: https://issues.apache.org/jira/browse/IGNITE-19715
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Mikhail Petrov
>            Assignee: Mikhail Petrov
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Thin client operations can take a long time if PA is enabled and some cluster 
> nodes are not reachable over network.
> Consider the following scenario:
> 1. The thin client have already sucessfully established connection to all 
> configured node addresses.
> 2. A particular cluster node becomes unreachable over network. It can be 
> reproduced with iptables -A INPUT -p tcp --dport for Linux.
> 3. The thin client periodically sends put request which is mapped by PA to 
> the unreachable node.
> 4. Firstly  all attempts to perform put will lead to `ClientException: 
> Timeout was reached before computation completed.` exception. But eventually 
> the connection to the unreachable node will be closed by OS (see 
> tcp_keepalive_time for Linux).
> This will lead to reestablishing connection to the unreachable node during 
> handling of the next put (see ReliableChannel.java:1012)
> We currently do not set a timeout for the open connection operation (see 
> GridNioClientConnectionMultiplexer#open, here we use Integer.MAX_VALUE for 
> Socket#connect(java.net.SocketAddress, int))
> As a result socket#connect operation (and hence put operation) hangs for a 
> significant amount of time (it depends on OS parameters, usually it is couple 
> of minutes). This is confusing for users because a single put may take much 
> longer than the configured ClientConfiguration#setTimeout property.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to