HoustonPutman commented on PR #1650: URL: https://github.com/apache/solr/pull/1650#issuecomment-1588135823
Thanks for the review Radu! > I saw a "parallel" flag, but I think replicas are moved one at a time, correct? If so, it will likely prove to be a bottleneck down the road. The replicas will be moved in parallel, but in the future we definitely should add a bottleneck option here. That's a great call-out. Separate PR IMO. > Am I reading the code right that we compute all the shard movements that have to be done, then do them all, then check if we were successful? Yes, that is correct. And this is also a good future edition I think. We can limit the balanceReplicas logic to `x` movements, then implement those movements then wait for the cluster to become healthy. We could do that logic in a loop until 0 movements are returned. Also a separate PR, but I love the idea. > I saw a maxBalanceSkew property, but I'm not really sure what it does. Is this the number of "weight points" difference between nodes under which we say "don't bother?" And that would be different based on the placement plugin? Currently it does absolutely nothing, but yes, the idea is that you move things around until the difference between the lowest weight and the highest weight is <= `maxBalanceSkew`. I didn't implement it yet, because I'm not sure how useful it would be honestly... I was also thinking originally that it would return an error if the maxBalanceSkew couldn't be achieved, but once again I'm not convinced of a use case here. I should remove it for now, since it is unused. > The balancing code that I saw for the Simple placement plugin seems more complicated than the minimizeCores one. Yeah, simple is more complex than minimizeCores. It was the Legacy implementation, but the name "Simple" was chosen over "Legacy". > In Affinity, we don't seem to really deal with the withCollection case, in the sense that we would probably want to move shards of both collections at the same time - we check if the destination node has that collection, correct? That is correct. It makes sure that a move works with the existing replicas, but it is not yet smart enough to move replicas together. I have ideas on how to improve this down the line, but I don't think its necessary for the first implementation. Basically for `canRemoveReplica(r)`, it would return the list of other replicas that would need to be removed along with that replica, and `canAddReplica(r)`, it would return the list of other replicas that would need to be added along with that replica. Then the orderedNode logic could make it work. Once again, this adds some complexity, and it can likely wait for another PR. > Do I get it right that we soft-enforce constraints like AZ-awareness by adding big weights? If so, would it be better down the road to add them as constraints? So yeah, they can absolutely be used as constraints, but right now (before this PR) they aren't, so this logic is keeping in-line with the logic that existed before. > And I think your PR already does that, no? If I simply create a new replica, the placement plugin will put it on the node with the lowest weight, correct? This is correct, and we can definitely add new constraints (like shardsPerNode/replicasPerNode) to the different PlacementPlugins. It will be especially easy to do given this new framework going forward! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org