[ 
https://issues.apache.org/jira/browse/SOLR-10733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032590#comment-16032590
 ] 

Bernd Fehling commented on SOLR-10733:
--------------------------------------

Some explanation about Rule.java / ReplicaAssigner.java and what this patch is 
doing.

After running all parts many many times to understand Rule and ReplicaAssigner 
this is *roughly* how it works.
- numShards and replicationFactor build a sorted list of Positions e.g. 
[shard1:0,shard2:0,shard1:1,shard2:1,...]
- there is a sorted list of LiveNodes
- there is a list with all Rules

It selects the first shard from Positions list, takes the first node from 
LiveNodes list and checks the node with his tags against all Rules.
- if the selected node with his tags doesn't pass *all* Rules it is skipped and 
the next node is selected
- if the selected node with his tags passes *all* Rules it is assigned to the 
selected shard
This continues until all shards with their replicas are filled.

Problem 1)
If the selected node under test doesn't pass the Rules it is simply skipped.
If we have a rule without wildcards then the node with his tags (port, 
node,rack,...) might fail for this shard but could later on pass the Rules if 
tested against other Positions. But this will never happen because the list of 
LiveNodes has always the same sequence.
Solution here, move the node in the list of LiveNodes to the end of the list so 
it might match later on.

Problem 2)
It is possible (as in testPlacement2) that you have many nodes but because of 
restrictive Rules you can only assign 2 or 3 of the nodes from LiveNodes to 
shards. You will run out of nodes passing the Rules and all nodes failing the 
Rules end up at the end of LiveNodes list. This is solved by checking the 
position in LiveNodes against the number of nodes in LiveNodes. If position is 
higher it will start from beginning of LiveNodes.

Problem 3)
If you want to have only 1 replica the rule will be "replica:<2" (as stated in 
"Rule-based Replica Placement" of Solr Documentation). 
Because each node is also counted as replica it leads to the situation where a 
shard has already assigned one node.
For the next node the test against Rules will be "is the number of replicas 
less than 2" which is positive and pass. So the node will be assigned to the 
shard. 
At the end of all assignments there is a final testing which will verify the 
result against all Rules.
But now the same test which passed before "is the number of replicas less than 
2" will fail because this test is done *after* the assignment.
This is solved by decreasing the "NumberOfNodesWithSameTagVal" by one during 
the phase of VERIFY.


> Rule-based Replica Placement not working correct
> ------------------------------------------------
>
>                 Key: SOLR-10733
>                 URL: https://issues.apache.org/jira/browse/SOLR-10733
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Rules, SolrCloud
>    Affects Versions: 6.5.1
>            Reporter: Bernd Fehling
>            Assignee: Noble Paul
>         Attachments: SOLR-10733.patch, SOLR-10733.patch
>
>
> A setup of a SolrCloud with 6 nodes on 3 server e.g.:
> {code}
> server1:8983 , server1:7574
> server2:8983 , server2:7574
> server3:8983 , server3:7574
> {code}
> and a command for creating a new collection with rule:
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE&name=boss&;
> collection.configName=boss_configs&numShards=3&replicationFactor=2&
> maxShardsPerNode=1&rule=shard:shard1,replica:<2,port:8983
> {code}
> should create a collection with 3 shards and least a shard1 with two 
> different nodes running on port 8983.
> {code}
> shard1 --> server_x:8983 ,  server_y:8983
> {code}
> A even more restrictive rule like
> {code}
> rule=shard:shard1,replica:<2,port:8983&rule=shard:shard3,replica:<2,port:7574
> {code}
> should also resolve to a solution because if it really checks all 
> permutations accross shards/replicas/ports and available nodes it should be 
> able to solve this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to