[
https://issues.apache.org/jira/browse/IGNITE-19410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mikhail Petrov updated IGNITE-19410:
------------------------------------
Summary: Node failure in case multiple nodes join and leave a cluster
simultaneously with security is enabled. (was: Node failure in case multiple
nodes join and leave a cluster simultaneously and security is enabled.)
> Node failure in case multiple nodes join and leave a cluster simultaneously
> with security is enabled.
> ------------------------------------------------------------------------------------------------------
>
> Key: IGNITE-19410
> URL: https://issues.apache.org/jira/browse/IGNITE-19410
> Project: Ignite
> Issue Type: Bug
> Reporter: Mikhail Petrov
> Priority: Major
> Attachments: NodeSecurityContextTest.java
>
>
> The case when nodes with security enabled join and leave the cluster
> simultaneously can cause the joining nodes to fail with the following
> exception:
> {code:java}
> [2023-05-03T14:54:31,208][ERROR][disco-notifier-worker-#332%ignite.NodeSecurityContextTest2%][IgniteTestResources]
> Critical system error detected. Will be handled accordingly to configured
> handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
> failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION,
> err=java.lang.IllegalStateException: Failed to find security context for
> subject with given ID : 4725544a-f144-4486-a705-46b2ac200011]]
> java.lang.IllegalStateException: Failed to find security context for subject
> with given ID : 4725544a-f144-4486-a705-46b2ac200011
> at
> org.apache.ignite.internal.processors.security.IgniteSecurityProcessor.withContext(IgniteSecurityProcessor.java:164)
> ~[classes/:?]
> at
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$3$SecurityAwareNotificationTask.run(GridDiscoveryManager.java:949)
> ~[classes/:?]
> at
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2822)
> ~[classes/:?]
> at
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2860)
> [classes/:?]
> at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125)
> [classes/:?]
> at java.lang.Thread.run(Thread.java:750) [?:1.8.0_351] {code}
> Reproducer is attached.
> Simplified steps that leads to the failure:
> 1. The client node sends an arbitrary discovery message which produces an
> acknowledgement message when it processed by the all cluster nodes .
> 2. The client node gracefully leaves the cluster.
> 3. The new node joins the cluster and receives a topology snapshot that does
> not include the left client node.
> 4. The new node receives an acknowledgment for the message from the step 1
> and fails during its processing because message originator node is not listed
> in the current discovery cache or discovery cache history (see
> IgniteSecurityProcessor#withContext(java.util.UUID)) . This is because
> currently the GridDiscoveryManager#historicalNode method only aware of the
> topology history that occurs after a node has joined the cluster. The
> complete cluster topology history that exists at the time a new node joined
> the cluster is stored in GridDiscoveryManager#topHist and is not taken into
> account by the GridDiscoveryManager#historicalNode method.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)