[
https://issues.apache.org/jira/browse/IGNITE-6256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16155582#comment-16155582
]
ASF GitHub Bot commented on IGNITE-6256:
----------------------------------------
GitHub user AMashenkov opened a pull request:
https://github.com/apache/ignite/pull/2603
IGNITE-6256: DiscoCache should always contains local node.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gridgain/apache-ignite ignite-6256
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/ignite/pull/2603.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2603
----
commit 46ca1c6dcc7163a8a997cde7360d48a29c1e482d
Author: Andrey V. Mashenkov <[email protected]>
Date: 2017-09-06T15:33:31Z
IGNITE-6256: DiscoCache always contains local node.
----
> When a node becomes segmented an AssertionError is thrown during
> GridDhtPartitionTopologyImpl.removeNode
> --------------------------------------------------------------------------------------------------------
>
> Key: IGNITE-6256
> URL: https://issues.apache.org/jira/browse/IGNITE-6256
> Project: Ignite
> Issue Type: Bug
> Components: general
> Affects Versions: 1.8
> Reporter: Alexandr Fedotov
> Assignee: Denis Mekhanikov
> Fix For: 2.3
>
>
> The assert is as follows:
> exception="java.lang.AssertionError: null
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.removeNode(GridDhtPartitionTopologyImpl.java:1422)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.beforeExchange(GridDhtPartitionTopologyImpl.java:490)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:769)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:504)
> at
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1689)
> at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
> at java.lang.Thread.run(Thread.java:745)
> Below is the sequence of steps that leads to the assertion error:
> 1) A node becomes SEGMENTED when it's determined by SegmentCheckWorker, after
> an EVT_NODE_FAILED has been received.
> 2) It gets visibleRemoteNodes from it's TcpDiscoveryNodesRing
> 3) Clears the TcpDiscoveryNodesRing leaving only self on the list. The node
> ring is used to determine if a node is alive
> during DiscoCache creation
> 4) After that, the node initiates removal of all the nodes read in step 2
> 5) For each node, it sends an EVT_NODE_FAILED to the corresponding
> DiscoverySpiListener
> providing a topology containing all the nodes except already processed
> 6) This event gets into GridDiscoveryManager
> 7) The node gets removed from alive nodes for every DiscoCache in
> discoCacheHist
> 8) Topology change is detected
> 9) Creation of a new DiscoCache is attempted. At this moment every remote
> node is not available due to the
> TcpDiscoveryNodesRing has been cleared, thus resulting in a DiscoCache with
> empty alives
> 10) The event with the created DiscoCache and the new topology version is
> passed to DiscoveryWorker
> 11) The event is eventually handled by DiscoveryWorker and is recorded by
> DiscoveryWorker#recordEvent
> 12) The recording is handled by GridEventStorageManager which notifies every
> listener for this event type (EVT_NODE_FAILED)
> 13) One of the listeners is GridCachePartitionExchangeManager#discoLsnr
> It creates a new GridDhtPartitionsExchangeFuture with the empty DiscoCache
> received with the event and enqueues it
> 14) The future gets eventually handled by GridDhtPartitionsExchangeFuture and
> initialized
> 15) updateTopologies is called, which for each GridCacheContext gets its
> topology (GridDhtPartitionTopology)
> and calls GridDhtPartitionTopology#updateTopologyVersion
> 16) DiscoCache for GridDhtPartitionTopology is assigned from the one of the
> GridDhtPartitionsExchangeFuture.
> The assigned DiscoCache has empty alives at the moment
> 15) A distributed exchange is handled
> (GridDhtPartitionsExchangeFuture#distributedExchange)
> 16) For each cache context GridCacheContext, for its topology
> (GridDhtPartitionTopologyImpl) GridDhtPartitionTopologyImpl#beforeExchange is
> called
> 17) The fact that the node has left is determined and
> GridDhtPartitionTopologyImpl#removeNode is called to handle it
> 18) An attempt is made to get the alive coordinator node by calling
> DiscoCache#oldestAliveServerNode
> 19) null is returned which results in an AssertionError
> The fix should probably prevent initiating exchange futures if a node has
> segmented.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)