denis-chudov commented on code in PR #5825: URL: https://github.com/apache/ignite-3/pull/5825#discussion_r2096310737
########## modules/partition-distribution/src/main/java/org/apache/ignite/internal/partitiondistribution/PendingAssignmentsCalculator.java: ########## @@ -49,15 +53,79 @@ public PendingAssignmentsCalculator target(Assignments target) { } /** - * Calculates the pending assignments queue between current stable and target assignments. + * Calculates the pending assignments queue between current stable (not included) and target assignments (included, last element). */ public AssignmentsQueue toQueue() { assert stable != null; assert target != null; Review Comment: Let's make the both fields final. ########## modules/distribution-zones/src/main/java/org/apache/ignite/internal/distributionzones/rebalance/RebalanceRaftGroupEventsListener.java: ########## @@ -308,13 +309,56 @@ private void doStableKeySwitch( Entry switchAppendEntry = values.get(switchAppendKey); Entry assignmentsChainEntry = values.get(assignmentsChainKey); + AssignmentsQueue pendingAssignmentsQueue = AssignmentsQueue.fromBytes(pendingEntry.value()); + + if (pendingAssignmentsQueue != null && pendingAssignmentsQueue.size() > 1) { + + if (pendingAssignmentsQueue.peekFirst().nodes().equals(stableFromRaft)) { + pendingAssignmentsQueue.poll(); // remove, first element was already applied to the configuration by pending listeners + } + + Assignments stable = Assignments.of(stableFromRaft, pendingAssignmentsQueue.peekFirst().timestamp()); + pendingAssignmentsQueue = PendingAssignmentsCalculator.pendingAssignmentsCalculator() + .stable(stable) + .target(pendingAssignmentsQueue.peekLast()) + .toQueue(); + + boolean updated = metaStorageMgr.invoke(iif( + revision(pendingPartAssignmentsKey).eq(pendingEntry.revision()), + ops(put(pendingPartAssignmentsKey, pendingAssignmentsQueue.toBytes())).yield(true), + ops().yield(false) + )) + .get().getAsBoolean(); Review Comment: Please add the TODO here: `// TODO: https://issues.apache.org/jira/browse/IGNITE-17592 Remove synchronous wait` Same for zones' listener ########## modules/partition-distribution/src/main/java/org/apache/ignite/internal/partitiondistribution/PendingAssignmentsCalculator.java: ########## @@ -49,15 +53,79 @@ public PendingAssignmentsCalculator target(Assignments target) { } /** - * Calculates the pending assignments queue between current stable and target assignments. + * Calculates the pending assignments queue between current stable (not included) and target assignments (included, last element). */ public AssignmentsQueue toQueue() { assert stable != null; assert target != null; - // naive version with only one configuration switch, calculation will be changed during IGNITE-23790 epic - AssignmentsQueue queue = new AssignmentsQueue(target); - assert Objects.equals(queue.peekLast(), target) : "Target assignments should be equal to the last element in the queue"; + if (target.force()) { + return new AssignmentsQueue(target); + } Review Comment: Why so? Are you sure that such transition is always possible? ########## modules/distribution-zones/src/main/java/org/apache/ignite/internal/distributionzones/rebalance/RebalanceRaftGroupEventsListener.java: ########## @@ -64,6 +64,7 @@ import org.apache.ignite.internal.partitiondistribution.Assignments; Review Comment: The logging of pending assignments is broken now. See `Received update on pending assignments` in TableManager: there is only first element of the queue logged. Same with logging of stable assignments change event: see `Received update on stable assignments` ########## modules/distribution-zones/src/integrationTest/java/org/apache/ignite/internal/rebalance/ItRebalanceByPendingAssignmentsQueueTest.java: ########## @@ -0,0 +1,286 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.ignite.internal.rebalance; + +import static java.util.Optional.ofNullable; +import static org.apache.ignite.internal.TestRebalanceUtil.pendingPartitionAssignmentsKey; +import static org.apache.ignite.internal.TestRebalanceUtil.stablePartitionAssignments; +import static org.apache.ignite.internal.TestWrappers.unwrapIgniteImpl; +import static org.apache.ignite.internal.TestWrappers.unwrapTableViewInternal; +import static org.apache.ignite.internal.lang.IgniteSystemProperties.enabledColocation; +import static org.apache.ignite.internal.partitiondistribution.PendingAssignmentsCalculator.pendingAssignmentsCalculator; +import static org.apache.ignite.internal.rebalance.ItRebalanceByPendingAssignmentsQueueTest.AssignmentsRecorder.recordAssignmentsEvents; +import static org.apache.ignite.internal.testframework.matchers.CompletableFutureMatcher.willCompleteSuccessfully; +import static org.apache.ignite.internal.util.CompletableFutures.nullCompletedFuture; +import static org.awaitility.Awaitility.await; +import static org.hamcrest.MatcherAssert.assertThat; +import static org.hamcrest.Matchers.equalTo; +import static org.hamcrest.Matchers.greaterThanOrEqualTo; +import static org.hamcrest.Matchers.hasSize; + +import java.util.Deque; +import java.util.HashSet; +import java.util.Map; +import java.util.Set; +import java.util.concurrent.CompletableFuture; +import java.util.concurrent.ConcurrentHashMap; +import java.util.concurrent.ConcurrentLinkedDeque; +import java.util.function.Consumer; +import org.apache.ignite.Ignite; +import org.apache.ignite.internal.ClusterPerTestIntegrationTest; +import org.apache.ignite.internal.app.IgniteImpl; +import org.apache.ignite.internal.catalog.Catalog; +import org.apache.ignite.internal.catalog.CatalogManager; +import org.apache.ignite.internal.catalog.descriptors.CatalogZoneDescriptor; +import org.apache.ignite.internal.distributionzones.rebalance.RebalanceUtil; +import org.apache.ignite.internal.distributionzones.rebalance.ZoneRebalanceUtil; +import org.apache.ignite.internal.lang.ByteArray; +import org.apache.ignite.internal.metastorage.Entry; +import org.apache.ignite.internal.metastorage.MetaStorageManager; +import org.apache.ignite.internal.metastorage.WatchEvent; +import org.apache.ignite.internal.metastorage.WatchListener; +import org.apache.ignite.internal.partitiondistribution.Assignment; +import org.apache.ignite.internal.partitiondistribution.Assignments; +import org.apache.ignite.internal.partitiondistribution.AssignmentsQueue; +import org.apache.ignite.internal.raft.RaftGroupEventsListener; +import org.apache.ignite.internal.replicator.PartitionGroupId; +import org.apache.ignite.internal.replicator.TablePartitionId; +import org.apache.ignite.internal.replicator.ZonePartitionId; +import org.apache.ignite.internal.table.TableViewInternal; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; + +/** + * Tests for rebalance - handling pending {@link AssignmentsQueue} by {@link RaftGroupEventsListener}. + */ +class ItRebalanceByPendingAssignmentsQueueTest extends ClusterPerTestIntegrationTest { Review Comment: This is nice basic test coverage, but I'm afraid it still may be not enough for such a serious change. Let's add some more integration rebalance test with promoting/demotion of nodes (through zone replicas, quorum size, filter changes, for example): - Scale up scenarios: if there are not enough peers, then peers should be added, otherwise learners should be added; - Scale down scenarios: if there are learners, and the peer node goes offline, one of the learners should be turned into peer. This should not cause the actual data rebalancing (but the process should pass the regular assignments’ switch flow); - Multi-step rebalance should be checked when a group leader is changed in the middle of pending assignments queue processing (checking that nothing is broken in ZoneRebalanceRaftGroupEventsListener#onLeaderElected ); - Scenario of the node restart during the multi-step rebalance should be checked (making sure that nothing is broken on node recovery, including the processing of pending assignments queue during the node recovery); - Disaster recovery: if the majority of peers is lost, and there is no ability to turn a learner into peer (they may be offline, outdated, etc.), then disaster recovery should work as for regular zones without learners; - Disaster recovery: the case when there are less nodes than configured quorum size, meaning that the configured value will be overridden. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@ignite.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org