On 4/25/2018 1:29 AM, Lukas Tribus wrote:
You seem to be able to reproduce this easily, so please share the logs
when this happens including the requests (don't use dontlognull), so
that we can see the server up/down events and the all the successful
and failing requests together with timestamps, return status and
return codes.

I can't mess too much with the setup where I saw the problem, because it's actively being used for some critical work right now, but I do have another install where I'm using the backup keyword for Solr servers behind haproxy.

backend be_sp
  description Solr backend for the Spark index.
  option  httpchk GET /solr/sparkmain/admin/ping
  balance leastconn
  timeout check   4990
  server  idxa6 10.100.0.250:8981 check inter 5s fastinter 2s rise 3 fall 2 weight 100   server  idxb6 10.100.0.251:8981 check inter 5s fastinter 2s rise 3 fall 2 weight 100 backup   server  idxa3 10.100.0.244:8981 check inter 15s fastinter 2s rise 2 fall 1 weight 30 backup   server  idxb3 10.100.0.245:8981 check inter 15s fastinter 2s rise 2 fall 1 weight 20 backup   server  bigindy5 10.100.1.39:8982 check inter 15s fastinter 2s rise 2 fall 1 weight 10 backup

I tried a similar experiment on that setup, and couldn't see the same behavior.  With a loop sending requests using curl every two seconds, I only got one 503 "no server available" response.  Here's some logs from haproxy:

Apr 25 10:10:32 localhost haproxy[20272]: 10.2.0.48:48065 [25/Apr/2018:10:10:31.358] fe_sp_8986 be_sp/idxa6 0/1001/-1/-1/1002 503 212 - - SC-- 5/0/0/0/1 0/0 "GET /solr/sparkmain/admin/ping?echoParams=none&shards.info=false HTTP/1.1" Apr 25 10:10:32 localhost haproxy[20272]: Server be_sp/idxa6 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 0 active and 4 backup servers left. Running on backup. 0 sessions active, 0 requeued, 0 remaining in queue. Apr 25 10:10:34 localhost haproxy[20272]: 10.2.0.48:48066 [25/Apr/2018:10:10:34.384] fe_sp_8986 be_sp/idxb6 0/0/0/14/14 200 371 - - ---- 5/1/0/1/0 0/0 "GET /solr/sparkmain/admin/ping?echoParams=none&shards.info=false HTTP/1.1"

And here's the two responses corresponding to those logs:

<html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">13</int></lst><str name="status">OK</str>
</response>

What I just saw with this is more in line with the behavior I'm after.  It would be nice if I didn't get any request failures at all, but having failures happen for a very short time isn't a problem.

When I did this before on the planet/hollywood backend, I got a whole bunch of "no server available" responses in a row from the curl-based script I was running.  The one where I saw the problem is running 1.5.12, this one where things seem to work right is running 1.5.14.

Thanks,
Shawn


Reply via email to