Re: [I] Shards in a down state after an HPA scale up / scale down event. [solr-operator]

via GitHub Wed, 28 Feb 2024 08:44:19 -0800


HoustonPutman commented on issue #682:
URL: https://github.com/apache/solr-operator/issues/682#issuecomment-1969397940


   Ok, so y'alls issues seem somewhat related.
   
   I have seen problems with Solr failing to delete bad replicas during an 
unsuccessful migration. And that's the reason why you are seeing a large 
increase in the number of replicas.
   
   > So i suspect something wrong with the scale down/up / migration of the 
shards. Every pod gets restarted during the downgrade......
   
   This is definitely a problem, and related to the fact that you are 
addressing your solr nodes through the ingress. In order for all Solr traffic 
to not be directed through the ingress (which would slow things down 
considerably), we use basically /etc/hosts on the pods to map each ingress 
address to the IP of the pod it maps to. And since you are scaling down, it is 
removing some of the /etc/hosts entries, thus requiring full restarts every 
time.
   
   An easy solution to this would be to only update the /etc/hosts if an IP is 
changed or added. It doesn't really matter if we have unused entries there.
   
   Anyways, we should definitely have an integration test that stresses the HPA 
with ingresses, because this seems like a very iffy edge case.
   
   > The same issue is happening to me
   
   @sabaribose I think this is separate, because you are not using an ingress, 
but using the headless service.
   
   I think your is from the BalanceReplicas command not queueing for a retry 
when it fails. But I will do more investigation here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Re: [I] Shards in a down state after an HPA scale up / scale down event. [solr-operator]

Reply via email to