[ https://issues.apache.org/jira/browse/SOLR-16753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17743281#comment-17743281 ]
Houston Putman commented on SOLR-16753: --------------------------------------- So I'm not sure why the test was passing initially, but you can tell when using DEBUG logging why this test fails: {code:java} 2023-07-14 16:26:56 2> 20611 INFO (recoveryExecutor-28-thread-1-processing-127.0.0.1:37169_solr coll_NRT_PULL_shard1_0_replica_p1 coll_NRT_PULL shard1_0 core_node10) [n:127.0.0.1:37169_solr c:coll_NRT_PULL s:shard1_0 r:core_node10 x:coll_NRT_PULL_shard1_0_replica_p1] o.a.s.c.o.SliceMutator Update shard state shard1_1 to active 2023-07-14 16:26:56 2> 20611 INFO (recoveryExecutor-28-thread-1-processing-127.0.0.1:37169_solr coll_NRT_PULL_shard1_0_replica_p1 coll_NRT_PULL shard1_0 core_node10) [n:127.0.0.1:37169_solr c:coll_NRT_PULL s:shard1_0 r:core_node10 x:coll_NRT_PULL_shard1_0_replica_p1] o.a.s.c.o.SliceMutator Update shard state shard1_0 to active 2023-07-14 16:26:56 2> 20611 INFO (recoveryExecutor-28-thread-1-processing-127.0.0.1:37169_solr coll_NRT_PULL_shard1_0_replica_p1 coll_NRT_PULL shard1_0 core_node10) [n:127.0.0.1:37169_solr c:coll_NRT_PULL s:shard1_0 r:core_node10 x:coll_NRT_PULL_shard1_0_replica_p1] o.a.s.c.o.SliceMutator Update shard state shard1 to inactive {code} ... {code:java} 2023-07-14 16:26:56 2> 20612 DEBUG (recoveryExecutor-28-thread-1-processing-127.0.0.1:37169_solr coll_NRT_PULL_shard1_0_replica_p1 coll_NRT_PULL shard1_0 core_node10) [n:127.0.0.1:37169_solr c:coll_NRT_PULL s:shard1_0 r:core_node10 x:coll_NRT_PULL_shard1_0_replica_p1] o.a.s.c.o.ReplicaMutator state.json is not persisted slice/replica : shard1_0/core_node10 2023-07-14 16:26:56 2> , old : { 2023-07-14 16:26:56 2> "core":"coll_NRT_PULL_shard1_0_replica_p1", 2023-07-14 16:26:56 2> "node_name":"127.0.0.1:37169_solr", 2023-07-14 16:26:56 2> "base_url":"http://127.0.0.1:37169/solr", 2023-07-14 16:26:56 2> "state":"down", 2023-07-14 16:26:56 2> "type":"PULL", 2023-07-14 16:26:56 2> "force_set_state":"false"}, 2023-07-14 16:26:56 2> new { 2023-07-14 16:26:56 2> "core":"coll_NRT_PULL_shard1_0_replica_p1", 2023-07-14 16:26:56 2> "node_name":"127.0.0.1:37169_solr", 2023-07-14 16:26:56 2> "base_url":"http://127.0.0.1:37169/solr", 2023-07-14 16:26:56 2> "state":"active", 2023-07-14 16:26:56 2> "type":"PULL", 2023-07-14 16:26:56 2> "force_set_state":"false"} {code} Basically, the replica state mutation causes the shards to become active. I don't really understand {{{}ReplicaMutator.persistStateJson(){}}}, but for some reason it's only choosing to update the {{state.json}} if the slice's state is "recovering"... Clearly the state needs to be updated if it changes from "recovering" to "active". So with this method returning false, the {{state.json}} is never updated with the active slices. I'm going to make a PR, but the fix is pretty simple. Update the state.json if the slice's state changes. > SplitShardWithNodeRoleTest.testSolrClusterWithNodeRoleWithPull failures > ----------------------------------------------------------------------- > > Key: SOLR-16753 > URL: https://issues.apache.org/jira/browse/SOLR-16753 > Project: Solr > Issue Type: Test > Reporter: Chris M. Hostetter > Assignee: Noble Paul > Priority: Major > Attachments: SOLR-16753.txt, Skjermbilde 2023-05-03 kl. 12.24.56.png > > > {{SplitShardWithNodeRoleTest.testSolrClusterWithNodeRoleWithPull}} – was > added on 2023-03-13, but somwhere between 2023-04-02 and 2023-04-09 it > started failing 15-20% on jenkins jobs with seeds that don't reliably > reproduce. > At first, this seemed like it might be related to SOLR-16751, but even with > that fix failures are still happening. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org