Hello Marcus,

It looks like a bug. I will check and file a jira issue.

> I suppose Ignite should be fault tolerant and outage of some nodes should
not cause shutdown of other nodes.
You are absolutely right.

Thanks,
S.


ср, 6 апр. 2022 г. в 10:49, Lo, Marcus <marcus...@citi.com>:

> Hi Ignite team,
>
>
>
> Can you please advise if there are anything that we can check on the
> below? Thanks.
>
>
>
> Regards,
>
> Marcus
>
>
>
> *From:* Lo, Marcus [ICG-IT]
> *Sent:* Wednesday, March 30, 2022 11:55 AM
> *To:* user
> *Subject:* Node crashed with error "Getting affinity for too old topology
> version that is already out of history"
>
>
>
> Hi,
>
>
>
> We are using Ignite 2.10.0 and have 5 nodes (with consistentId/hostname -
> lrdeqprmap01p, lrdeqprmap02p, lrdeqprmap03p, lcgeqprmap03c, lcgeqprmap04c)
> in the cluster, and at one time 2 of the nodes (lcgeqprmap03c,
> lcgeqprmap04c) were out due to power outage. Somehow another node
> lrdeqprmap03p shut down shortly after that, with the following error:
>
>
>
> 2022-03-29 14:32:01.996+0100 ERROR
> [query-#194160%Prism%]                                          : Critical
> system error detected. Will be handled accordingly to configured handler
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
> failureCtx=FailureContext [type=CRITICAL_ERROR, err=class
> o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is
> corrupted [pages(groupId, pageId)=[], cacheId=388652627,
> cacheName=LIMIT_DASHBOARD_SNAPSHOT, indexName=_key_PK, msg=Runtime failure
> on bounds: [lower=null, upper=null]]]]
>
> org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
> B+Tree is corrupted [pages(groupId, pageId)=[], cacheId=388652627,
> cacheName=LIMIT_DASHBOARD_SNAPSHOT, indexName=_key_PK, msg=Runtime failure
> on bounds: [lower=null, upper=null]]
>
>                 at
> org.apache.ignite.internal.processors.query.h2.database.H2Tree.corruptedTreeException(H2Tree.java:977)
> ~[ignite-indexing-2.10.0.jar:2.10.0]
>
>                 at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:1133)
> ~[ignite-core-2.10.0.jar:2.10.0]
>
>                 at
> org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.find(H2TreeIndex.java:415)
> ~[ignite-indexing-2.10.0.jar:2.10.0]
>
> …
>
> Caused by:
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTreeRuntimeException:
> java.lang.IllegalStateException: Getting affinity for too old topology
> version that is already out of history [locNode=TcpDiscoveryNode
> [id=e21d561d-314a-4240-a379-23f139870717, consistentId=lrdeqprmap03p,
> addrs=ArrayList [127.0.0.1, 169.182.110.133], sockAddrs=HashSet [/
> 127.0.0.1:47500, lrdeqprmap03p.eur.nsroot.net/169.182.110.133:47500],
> discPort=47500, order=7, intOrder=7, lastExchangeTime=1648560721652,
> loc=true, ver=2.10.0#20210310-sha1:bc24f6ba, isClient=false],
> grp=LimitDashboardSnapshotCache, topVer=AffinityTopologyVersion [topVer=17,
> minorTopVer=28], lastAffChangeTopVer=AffinityTopologyVersion [topVer=17,
> minorTopVer=28], head=AffinityTopologyVersion [topVer=19, minorTopVer=0],
> history=[AffinityTopologyVersion [topVer=18, minorTopVer=0],
> AffinityTopologyVersion [topVer=19, minorTopVer=0]]]
>
>                 at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findLowerUnbounded(BPlusTree.java:1079)
> ~[ignite-core-2.10.0.jar:2.10.0]
>
>                 at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:1118)
> ~[ignite-core-2.10.0.jar:2.10.0]
>
>                 ... 23 more
>
> Caused by: java.lang.IllegalStateException: Getting affinity for too old
> topology version that is already out of history [locNode=TcpDiscoveryNode
> [id=e21d561d-314a-4240-a379-23f139870717, consistentId=lrdeqprmap03p,
> addrs=ArrayList [127.0.0.1, 169.182.110.133], sockAddrs=HashSet [/
> 127.0.0.1:47500, lrdeqprmap03p.eur.nsroot.net/169.182.110.133:47500],
> discPort=47500, order=7, intOrder=7, lastExchangeTime=1648560721652,
> loc=true, ver=2.10.0#20210310-sha1:bc24f6ba, isClient=false],
> grp=LimitDashboardSnapshotCache, topVer=AffinityTopologyVersion [topVer=17,
> minorTopVer=28], lastAffChangeTopVer=AffinityTopologyVersion [topVer=17,
> minorTopVer=28], head=AffinityTopologyVersion [topVer=19, minorTopVer=0],
> history=[AffinityTopologyVersion [topVer=18, minorTopVer=0],
> AffinityTopologyVersion [topVer=19, minorTopVer=0]]]
>
>                 at
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignmentCache.cachedAffinity(GridAffinityAssignmentCache.java:831)
> ~[ignite-core-2.10.0.jar:2.10.0]
>
>                 at
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignmentCache.cachedAffinity(GridAffinityAssignmentCache.java:778)
> ~[ignite-core-2.10.0.jar:2.10.0]
>
>                 at
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignmentCache.nodes(GridAffinityAssignmentCache.java:686)
> ~[ignite-core-2.10.0.jar:2.10.0]
>
>                 ...
>
>
>
> I suppose Ignite should be fault tolerant and outage of some nodes should
> not cause shutdown of other nodes. Can you please advise? I have attached
> the full log for your reference. Thanks.
>
>
>
> Regards,
>
> Marcus
>
>
>

Reply via email to