[ https://issues.apache.org/jira/browse/IGNITE-19519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vadim Pakhnushev updated IGNITE-19519: -------------------------------------- Description: h3. Deployment unit removal process The deployment unit must be removed from all cluster nodes. In order to achieve this the following must be implemented: # Change {{clusterDURecord.status}} to {{OBSOLETE}} value. This operation could fail because another process has already changed status to {{OBSOLETE}} or {{REMOVING}} value. It is also impossible to start an undeployment process in case the deployment process is still in progress. After this step the deployment unit is not available for new code execution. Code execution in progress still can use this deployment unit. # Meta storage event must be fired to all target nodes due to a change of {{{}clusterDURecord.status{}}}. # After receiving this event by the target node the system must change {{nodeDURecord.status}} to {{OBSOLETE}} value. # The node waits for finishing of all code executions in progress that depend on this deployment unit. As soon as all code executions are finished {{nodeDURecord.status}} must be changed to {{REMOVING}} value. >From this point it is impossible to use the deployment unit for code execution >neither for new tasks nor for old tasks (the second is impossible due to the >invariant that all old tasks are finished). # For each change of {{nodeDURecord.status}} to {{REMOVING}} value the system is able to receive an event from meta storage and check that all nodes have {{{}nodeDURecord.status == REMOVING{}}}. If the condition is met then {{clusterDURecord.status}} must be changed to {{REMOVING}} too. # Now the deployment unit can be removed from each target node and, after it, remove corresponding status records. # For each removal of {{nodeDURecord}} record from meta storage the system is able to receive an event from meta storage and check that there are no any {{nodeDURecord}} records for the given deployment unit. Now the system must remove the {{clusterDURecord}} record for the deployment unit. Note that If the deployment unit was removed then there are no any class loaders associated with this deployment unit. Eventually the class loader should be collected by GC and all classes must be unloaded from JVM. It is the critical requirement in order to avoid memory leaks related to multiple class loading/unloading. h3. Node restart during unit removal process If a target node was restarted during deployment unit removal process then the node must find all deployment units with {{clusterDURecord.status == OBSOLETE}} or {{clusterDURecord.status == REMOVING}} for restarted node and finish deployment unit removal process as described in the previous section. was: h3. Deployment unit removal process The deployment unit must be removed from all cluster nodes. In order to achieve this the following must be implemented: # Change {{clusterDURecord.status}} to {{OBSOLETE}} value. This operation could fail because another process has already changed status to {{OBSOLETE}} or {{REMOVING}} value. It is also impossible to start an undeployment process in case the deployment process is still in progress. After this step the deployment unit is not available for new code execution. Code execution in progress still can use this deployment unit. # Meta storage event must be fired to all target nodes due to a change of {{{}clusterDURecord.status{}}}. # After receiving this event by the target node the system must change {{nodeDURecord.status}} to {{OBSOLETE}} value. # The node waits for finishing of all code executions in progress that depend on this deployment unit. As soon as all code executions are finished {{nodeDURecord.status}} must be changed to {{REMOVING}} value. >From this point it is impossible to use the deployment unit for code execution >neither for new tasks nor for old tasks (the second is impossible due to the >invariant that all old tasks are finished). # For each change of {{nodeDURecord.status}} to {{REMOVING}} value the system is able to receive an event from meta storage and check that all nodes have {{{}nodeDURecord.status == REMOVING{}}}. If the condition is met then {{clusterDURecord.status}} must be changed to {{REMOVING}} too. # Now the deployment unit can be removed from each target node and, after it, remove corresponding status records. # For each removal of {{nodeDURecord}} record from meta storage the system is able to receive an event from meta storage and check that there are no any {{nodeDURecord}} records for the given deployment unit. Now the system must remove the {{clusterDURecord}} record for the deployment unit. Note that If the deployment unit was removed then there are no any class loaders associated with this deployment unit. Eventually the class loader should be collected by GC and all classes must be unloaded from JVM. It is the critical requirement in order to avoid memory leaks related to multiple class loading/unloading. h3. Node restart during unit removal process If a target node was restarted during deployment unit removal process then the node must find all deployment units with clusterDURecord.status == OBSOLETE or clusterDURecord.status == REMOVING for restarted node and finish deployment unit removal process as described in the previous section. > Deployment unit removal > ----------------------- > > Key: IGNITE-19519 > URL: https://issues.apache.org/jira/browse/IGNITE-19519 > Project: Ignite > Issue Type: New Feature > Reporter: Mikhail Pochatkin > Assignee: Vadim Pakhnushev > Priority: Major > Labels: iep-103, ignite-3 > > h3. Deployment unit removal process > The deployment unit must be removed from all cluster nodes. In order to > achieve this the following must be implemented: > # Change {{clusterDURecord.status}} to {{OBSOLETE}} value. This operation > could fail because another process has already changed status to {{OBSOLETE}} > or {{REMOVING}} value. It is also impossible to start an undeployment process > in case the deployment process is still in progress. > After this step the deployment unit is not available for new code execution. > Code execution in progress still can use this deployment unit. > # Meta storage event must be fired to all target nodes due to a change of > {{{}clusterDURecord.status{}}}. > # After receiving this event by the target node the system must change > {{nodeDURecord.status}} to {{OBSOLETE}} value. > # The node waits for finishing of all code executions in progress that > depend on this deployment unit. As soon as all code executions are finished > {{nodeDURecord.status}} must be changed to {{REMOVING}} value. > From this point it is impossible to use the deployment unit for code > execution neither for new tasks nor for old tasks (the second is impossible > due to the invariant that all old tasks are finished). > # For each change of {{nodeDURecord.status}} to {{REMOVING}} value the > system is able to receive an event from meta storage and check that all nodes > have {{{}nodeDURecord.status == REMOVING{}}}. If the condition is met then > {{clusterDURecord.status}} must be changed to {{REMOVING}} too. > # Now the deployment unit can be removed from each target node and, after > it, remove corresponding status records. > # For each removal of {{nodeDURecord}} record from meta storage the system > is able to receive an event from meta storage and check that there are no any > {{nodeDURecord}} records for the given deployment unit. Now the system must > remove the {{clusterDURecord}} record for the deployment unit. > > Note that If the deployment unit was removed then there are no any class > loaders associated with this deployment unit. Eventually the class loader > should be collected by GC and all classes must be unloaded from JVM. It is > the critical requirement in order to avoid memory leaks related to multiple > class loading/unloading. > h3. Node restart during unit removal process > If a target node was restarted during deployment unit removal process then > the node must find all deployment units with {{clusterDURecord.status == > OBSOLETE}} or {{clusterDURecord.status == REMOVING}} for restarted node and > finish deployment unit removal process as described in the previous section. -- This message was sent by Atlassian Jira (v8.20.10#820010)