[
https://issues.apache.org/jira/browse/IGNITE-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506083#comment-16506083
]
Andrey Gura edited comment on IGNITE-8751 at 6/8/18 2:43 PM:
-------------------------------------------------------------
It isn't race. {{tcp-disco-srvr}} and {{tcp-disco-msg-worker}} are interrupted
earlier than segmentation policy handles segmentation. See
{{org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.DiscoveryWorker#onSegmentation}}
where we first disconnect SPI and then handle segmentation.
It seems could be fixed by adding check on SPI state in exception handler of
{{tcp-disco-srvr}} and {{tcp-disco-msg-worker}}.
was (Author: agura):
It isn't race. {{tcp-disco-srvr}} is interrupted earlier than segmentation
policy handles segmentation. See
{{org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.DiscoveryWorker#onSegmentation}}
where we first disconnect SPI and then handle segmentation.
It seems could be fixed by adding check on SPI state in exception handler of
{{tcp-disco-srvr}}.
> Possible race on node segmentation.
> -----------------------------------
>
> Key: IGNITE-8751
> URL: https://issues.apache.org/jira/browse/IGNITE-8751
> Project: Ignite
> Issue Type: Bug
> Affects Versions: 2.5
> Reporter: Andrew Mashenkov
> Assignee: Andrey Gura
> Priority: Major
> Fix For: 2.6
>
>
> Segmentation policy may be ignored, probably, due to a race.
> See [1] for details.
> [1]
> [http://apache-ignite-users.70518.x6.nabble.com/Node-pause-for-no-obvious-reason-td21923.html]
> Logs from segmented node.
> [08:42:42,290][INFO][tcp-disco-sock-reader-#15][TcpDiscoverySpi] Finished
> serving remote node connection [rmtAddr=/10.29.42.45:38712, rmtPort=38712
> [08:42:42,290][WARNING][disco-event-worker-#161][GridDiscoveryManager] Local
> node SEGMENTED: TcpDiscoveryNode [id=8333aa56-8bf4-4558-a387-809b1d2e2e5b,
> addrs=[10.29.42.44, 127.0.0.1], sockAddrs=[sap-datanode1/10.29.42.44:49500,
> /127.0.0.1:49500], discPort=49500, order=1, intOrder=1,
> lastExchangeTime=1528447362286, loc=true, ver=2.5.0#20180523-sha1:86e110c7,
> isClient=false]
> [08:42:42,294][SEVERE][tcp-disco-srvr-#2][] Critical system error detected.
> Will be handled accordingly to configured handler [hnd=class
> o.a.i.failure.StopNodeOrHaltFailureHandler, failureCtx=FailureContext
> [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Thread
> tcp-disco-srvr-#2 is terminated unexpectedly.]]
> java.lang.IllegalStateException: Thread tcp-disco-srvr-#2 is terminated
> unexpectedly.
> at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:5686)
>
> at
> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> [08:42:42,294][SEVERE][tcp-disco-srvr-#2][] JVM will be halted immediately
> due to the failure: [failureCtx=FailureContext
> [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Thread
> tcp-disco-srvr-#2 is terminated unexpectedly.]]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)