HoustonPutman commented on issue #682: URL: https://github.com/apache/solr-operator/issues/682#issuecomment-1969397940
Ok, so y'alls issues seem somewhat related. I have seen problems with Solr failing to delete bad replicas during an unsuccessful migration. And that's the reason why you are seeing a large increase in the number of replicas. > So i suspect something wrong with the scale down/up / migration of the shards. Every pod gets restarted during the downgrade...... This is definitely a problem, and related to the fact that you are addressing your solr nodes through the ingress. In order for all Solr traffic to not be directed through the ingress (which would slow things down considerably), we use basically /etc/hosts on the pods to map each ingress address to the IP of the pod it maps to. And since you are scaling down, it is removing some of the /etc/hosts entries, thus requiring full restarts every time. An easy solution to this would be to only update the /etc/hosts if an IP is changed or added. It doesn't really matter if we have unused entries there. Anyways, we should definitely have an integration test that stresses the HPA with ingresses, because this seems like a very iffy edge case. > The same issue is happening to me @sabaribose I think this is separate, because you are not using an ingress, but using the headless service. I think your is from the BalanceReplicas command not queueing for a retry when it fails. But I will do more investigation here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org