Re: Backup server takes too long to go active

Shawn Heisey Wed, 25 Apr 2018 09:34:44 -0700

On 4/25/2018 1:29 AM, Lukas Tribus wrote:

You seem to be able to reproduce this easily, so please share the logs
when this happens including the requests (don't use dontlognull), so
that we can see the server up/down events and the all the successful
and failing requests together with timestamps, return status and
return codes.

I can't mess too much with the setup where I saw the problem, becauseit's actively being used for some critical work right now, but I do haveanother install where I'm using the backup keyword for Solr serversbehind haproxy.


backend be_sp
  description Solr backend for the Spark index.
  option  httpchk GET /solr/sparkmain/admin/ping
  balance leastconn
  timeout check   4990

server idxa6 10.100.0.250:8981 check inter 5s fastinter 2s rise 3fall 2 weight 100 server idxb6 10.100.0.251:8981 check inter 5s fastinter 2s rise 3fall 2 weight 100 backup server idxa3 10.100.0.244:8981 check inter 15s fastinter 2s rise 2fall 1 weight 30 backup server idxb3 10.100.0.245:8981 check inter 15s fastinter 2s rise 2fall 1 weight 20 backup server bigindy5 10.100.1.39:8982 check inter 15s fastinter 2s rise 2fall 1 weight 10 backup

I tried a similar experiment on that setup, and couldn't see the samebehavior. With a loop sending requests using curl every two seconds, Ionly got one 503 "no server available" response. Here's some logs fromhaproxy:

Apr 25 10:10:32 localhost haproxy[20272]: 10.2.0.48:48065[25/Apr/2018:10:10:31.358] fe_sp_8986 be_sp/idxa6 0/1001/-1/-1/1002 503212 - - SC-- 5/0/0/0/1 0/0 "GET/solr/sparkmain/admin/ping?echoParams=none&shards.info=false HTTP/1.1"Apr 25 10:10:32 localhost haproxy[20272]: Server be_sp/idxa6 is DOWN,reason: Layer4 connection problem, info: "Connection refused", checkduration: 0ms. 0 active and 4 backup servers left. Running on backup. 0sessions active, 0 requeued, 0 remaining in queue.Apr 25 10:10:34 localhost haproxy[20272]: 10.2.0.48:48066[25/Apr/2018:10:10:34.384] fe_sp_8986 be_sp/idxb6 0/0/0/14/14 200 371 -- ---- 5/1/0/1/0 0/0 "GET/solr/sparkmain/admin/ping?echoParams=none&shards.info=false HTTP/1.1"


And here's the two responses corresponding to those logs:

<html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>
<?xml version="1.0" encoding="UTF-8"?>
<response>

<lst name="responseHeader"><int name="status">0</int><intname="QTime">13</int></lst><str name="status">OK</str>

</response>

What I just saw with this is more in line with the behavior I'm after. It would be nice if I didn't get any request failures at all, but havingfailures happen for a very short time isn't a problem.

When I did this before on the planet/hollywood backend, I got a wholebunch of "no server available" responses in a row from the curl-basedscript I was running. The one where I saw the problem is running1.5.12, this one where things seem to work right is running 1.5.14.


Thanks,
Shawn

Re: Backup server takes too long to go active

Reply via email to