[ https://issues.apache.org/jira/browse/SOLR-17331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856041#comment-17856041 ]
ASF subversion and git services commented on SOLR-17331: -------------------------------------------------------- Commit 04acaca3e186e3a1e3f260bf3c3ac8ed32b1ff28 in solr's branch refs/heads/branch_9x from Houston Putman [ https://gitbox.apache.org/repos/asf?p=solr.git;h=04acaca3e18 ] SOLR-17331: More optimal placements with OrderedNodePlacementPlugin (#2515) - Move tests, adding tests for the simple plugin (cherry picked from commit fc0d84afaa8b49bd0515f796abd901e5150d5982) > MigrateReplicasTest.testGoodSpreadDuringAssignWithNoTarget is flaky > ------------------------------------------------------------------- > > Key: SOLR-17331 > URL: https://issues.apache.org/jira/browse/SOLR-17331 > Project: Solr > Issue Type: Test > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Reporter: Yohann Callea > Assignee: Houston Putman > Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > The test *_MigrateReplicasTest.testGoodSpreadDuringAssignWithNoTarget_* is > sometimes (< 3% failure rate) failing on its last assertion, as shows the > [trend history of test > failures|#series/org.apache.solr.cloud.MigrateReplicasTest.testGoodSpreadDuringAssignWithNoTarget]. > > This test spins off a 5 nodes cluster, creates a collection with 3 shards and > a replication factor of 2. > It then vacate 2 randomly chosen nodes using the Migrate Replicas command > and, after the migration completion, expect the vacated node to be assigned > no replicas and the 6 replicas to be evenly spread across the 3 non-vacated > nodes (i.e., 2 replicas positioned on each node). > However, this last assertion happen to fail as the replicas are sometimes not > evenly spread over the 3 non-vacated nodes. > {code:java} > The non-source node '127.0.0.1:36007_solr' has the wrong number of replicas > after the migration expected:<2> but was:<1> {code} > > If we analyse more in detail a failure situation, it appears that this test > is inherently expected to fail under some circumstances, given how the > Migrate Replicas command operate. > When migrating replicas, the new position of the replicas to be moved are > calculated sequentially and, for every consecutive move, the position is > decided according to the logic implemented by the replica placement plugin > currently configured. > We can therefore end up in the following situation. > h2. Failing scenario > Note that this test always uses the default replica placement strategy, which > is Simple as of today. > Let's assume the following initial state, after the collection creation. > {code:java} > | NODE_0 | NODE_1 | NODE_2 | NODE_3 | NODE_4 | > --------+---------+---------+---------+---------+---------+ > SHARD_1 | X | | | X | | > SHARD_2 | | X | | X | | > SHARD_3 | | | X | | X | {code} > The test now runs the migrate command to vacate *_NODE_3_* and > {*}_NODE_4_{*}. It therefore needs to go through 3 replica movements for > emptying these two nodes. > h4. Move 1 > We are moving the replica of *_SHARD_1_* positioned on {*}_NODE_3_{*}. > _*NODE_0*_ is not an eligible destination for this replica as this node is > already assigned a replica of {*}_SHARD_1_{*}, and both *_NODE_1_* and > _*NODE_2*_ can be chosen as they host the same number of replicas. > *_NODE_1_* is arbitrarily chosen amongst the two best candidate destination > nodes. > {code:java} > | NODE_0 | NODE_1 | NODE_2 | NODE_3 | NODE_4 | > --------+---------+---------+---------+---------+---------+ > SHARD_1 | X | X | | | | > SHARD_2 | | X | | X | | > SHARD_3 | | | X | | X | {code} > h4. Move 2 > We are moving the replica of *_SHARD_2_* positioned on {*}_NODE_3_{*}. > _*NODE_1*_ is not an eligible destination for this replica as this node is > already assigned a replica of {*}_SHARD_2_{*}, and both *_NODE_0_* and > _*NODE_2*_ can be chosen as they host the same number of replicas. > *_NODE_0_* is arbitrarily chosen amongst the two best candidate destination > nodes. > {code:java} > | NODE_0 | NODE_1 | NODE_2 | NODE_3 | NODE_4 | > --------+---------+---------+---------+---------+---------+ > SHARD_1 | X | X | | | | > SHARD_2 | X | X | | | | > SHARD_3 | | | X | | X |{code} > h4. Move 3 > We are moving the replica of *_SHARD_3_* positioned on {*}_NODE_4_{*}. > _*NODE_2*_ is not an eligible destination for this replica as this node is > already assigned a replica of {*}_SHARD_3_{*}, and both *_NODE_0_* and > _*NODE_1*_ can be chosen as they host the same number of replicas. > *_NODE_1_* is arbitrarily chosen amongst the two best candidate destination > nodes. > {code:java} > | NODE_0 | NODE_1 | NODE_2 | NODE_3 | NODE_4 | > --------+---------+---------+---------+---------+---------+ > SHARD_1 | X | X | | | | > SHARD_2 | X | X | | | | > SHARD_3 | | X | X | | |{code} > > The test will then fail as the replicas are not evenly positioned across the > non-vacated nodes, while it is arguably the expected outcome in the current > situation given the Simple placement strategy implementation. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org