[ https://issues.apache.org/jira/browse/SOLR-16438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680184#comment-17680184 ]
Bruno Roustant commented on SOLR-16438: --------------------------------------- This lightly flapping test failure is due to a SocketTimeoutException. I can't reproduce on my machine, but it seems the test envs need more time during split, especially for this test running three hosts in a mini cluster. I make the test set the socket timeout to a higher value (120s) than the default 90s set in MiniSolrCloudCluster. > Shard split should be able to set preferred leaders on other replicas > --------------------------------------------------------------------- > > Key: SOLR-16438 > URL: https://issues.apache.org/jira/browse/SOLR-16438 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Bruno Roustant > Assignee: Bruno Roustant > Priority: Major > Fix For: 9.2 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Currently, shard split always create a first replica for each sub-shard on > the current host. Then it creates other replicas and their corresponding > sub-shards are in RECOVERY state. The effect is that the first replica (on > the current host) is always the leader, meaning that if the sub-shards are > split themselves, their sub-sub-shards leaders are also on the same host. > This can lead to very unbalanced situation where the same host is the leader > for a whole set of shards. > A solution to distribute evenly the leaders is to flag some other replicas > with the preferredLeader property during the split. Then a rebalance-leaders > command can elect the appropriate leaders. If we do that for each split, then > all the sub-shards have their leaders correctly balanced. > To go further, we can improve CollectionsHandler#CollectionOperation to > support combined operations. That way a CollectionOperation#SPLITSHARD_OP can > trigger a split op, then a wait for split completion op, and then a rebalance > leaders op. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org