[jira] [Commented] (IGNITE-8985) Node segmented itself after connRecoveryTimeout

ASF GitHub Bot (JIRA) Mon, 16 Jul 2018 05:52:19 -0700


    [ 
https://issues.apache.org/jira/browse/IGNITE-8985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545173#comment-16545173
 ]


ASF GitHub Bot commented on IGNITE-8985:
----------------------------------------

GitHub user dkarachentsev opened a pull request:

    https://github.com/apache/ignite/pull/4365

    IGNITE-8985 - Node segmented itself after connRecoveryTimeout. Improv…

    …ed loopback resolving from IGNITE-8683.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gridgain/apache-ignite ignite-8985

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/ignite/pull/4365.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4365
    
----
commit a61635b306d3e7a3977708ec027ab31c67f8ad1a
Author: dkarachentsev <dkarachentsev@...>
Date:   2018-07-16T12:50:32Z

    IGNITE-8985 - Node segmented itself after connRecoveryTimeout. Improved 
loopback resolving from IGNITE-8683.

----


> Node segmented itself after connRecoveryTimeout
> -----------------------------------------------
>
>                 Key: IGNITE-8985
>                 URL: https://issues.apache.org/jira/browse/IGNITE-8985
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Mikhail Cherkasov
>            Assignee: Dmitry Karachentsev
>            Priority: Major
>         Attachments: Archive.zip
>
>
> I can see the following message in logs:
> [2018-07-10 16:27:13,111][WARN ][tcp-disco-msg-worker-#2] Unable to connect 
> to next nodes in a ring, it seems local node is experiencing connectivity 
> issues. Segmenting local node to avoid case when one node fails a big part of 
> cluster. To disable that behavior set 
> TcpDiscoverySpi.setConnectionRecoveryTimeout() to 0. 
> [connRecoveryTimeout=10000, effectiveConnRecoveryTimeout=10000]
> [2018-07-10 16:27:13,112][WARN ][disco-event-worker-#61] Local node 
> SEGMENTED: TcpDiscoveryNode [id=e1a19d8e-2253-458c-9757-e3372de3bef9, 
> addrs=[127.0.0.1, 172.17.0.1, 172.25.1.17], sockAddrs=[/172.17.0.1:47500, 
> lab17.gridgain.local/172.25.1.17:47500, /127.0.0.1:47500], discPort=47500, 
> order=2, intOrder=2, lastExchangeTime=1531229233103, loc=true, 
> ver=2.4.7#20180710-sha1:a48ae923, isClient=false]
> I have failure detection time out 60_000 and during the test I had GC 
> <25secs, so I don't expect that node should be segmented.
>  
> Logs are attached.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (IGNITE-8985) Node segmented itself after connRecoveryTimeout

Reply via email to