[ https://issues.apache.org/jira/browse/IGNITE-19410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dmitry Pavlov updated IGNITE-19410: ----------------------------------- Labels: ise (was: ) > Node failure in case multiple nodes join and leave a cluster simultaneously > with security is enabled. > ------------------------------------------------------------------------------------------------------ > > Key: IGNITE-19410 > URL: https://issues.apache.org/jira/browse/IGNITE-19410 > Project: Ignite > Issue Type: Bug > Reporter: Mikhail Petrov > Priority: Major > Labels: ise > Attachments: NodeSecurityContextTest.java > > Time Spent: 20m > Remaining Estimate: 0h > > The case when nodes with security enabled join and leave the cluster > simultaneously can cause the joining nodes to fail with the following > exception: > {code:java} > [2023-05-03T14:54:31,208][ERROR][disco-notifier-worker-#332%ignite.NodeSecurityContextTest2%][IgniteTestResources] > Critical system error detected. Will be handled accordingly to configured > handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, > super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet > [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], > failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, > err=java.lang.IllegalStateException: Failed to find security context for > subject with given ID : 4725544a-f144-4486-a705-46b2ac200011]] > java.lang.IllegalStateException: Failed to find security context for subject > with given ID : 4725544a-f144-4486-a705-46b2ac200011 > at > org.apache.ignite.internal.processors.security.IgniteSecurityProcessor.withContext(IgniteSecurityProcessor.java:164) > ~[classes/:?] > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$3$SecurityAwareNotificationTask.run(GridDiscoveryManager.java:949) > ~[classes/:?] > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2822) > ~[classes/:?] > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2860) > [classes/:?] > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125) > [classes/:?] > at java.lang.Thread.run(Thread.java:750) [?:1.8.0_351] {code} > Reproducer is attached. > Simplified steps that leads to the failure: > 1. The client node sends an arbitrary discovery message which produces an > acknowledgement message when it processed by the all cluster nodes . > 2. The client node gracefully leaves the cluster. > 3. The new node joins the cluster and receives a topology snapshot that does > not include the left client node. > 4. The new node receives an acknowledgment for the message from the step 1 > and fails during its processing because message originator node is not listed > in the current discovery cache or discovery cache history (see > IgniteSecurityProcessor#withContext(java.util.UUID)) . This is because > currently the GridDiscoveryManager#historicalNode method only aware of the > topology history that occurs after a node has joined the cluster. The > complete cluster topology history that exists at the time a new node joined > the cluster is stored in GridDiscoveryManager#topHist and is not taken into > account by the GridDiscoveryManager#historicalNode method. > -- This message was sent by Atlassian Jira (v8.20.10#820010)