On 4/24/2018 3:38 PM, Cyril Bonté wrote:
Le 24/04/2018 à 23:07, Shawn Heisey a écrit :
The configuration I had is with a backend that has two servers, one of
them tagged as backup. This is the actual config that I had active when
I saw the problem:

backend be-cdn-9000
         description Back end for the thumbs CDN
         cookie MSDSRVHA insert indirect nocache
         server planet 10.100.2.123:9000 weight 100 cookie planet track
chk-cdn-9000/planet
         server hollywood 10.100.2.124:9000 weight 100 backup cookie
hollywood track chk-cdn-9000/hollywood

Well, you don't provide any information about the tracked servers chk-cdn-9000/planet and chk-cdn-9000/hollywood.

This is the tracking backend at the time.  In the current config, this backend no longer exists.  I couldn't get the disable-on-404 setting to work with a tracking back end, so the real backend does the health checks now.

backend chk-cdn-9000
  description A healthcheck backend for the thumbnail CDN.
  option httpchk GET /healthcheck
  server planet 10.100.2.123:9000 check inter 10s fastinter 3s rise 3 fall 2   server hollywood 10.100.2.124:9000 check inter 10s fastinter 3s rise 3 fall 2

Without any information about the 2 tracked server, I'd say the behaviour is expected. A backup server is promoted only if it is UP itself. What is the state of chk-cdn-9000/hollywood during that time ? It looks like it's not UP yet.

Before beginning, both servers were up.  The one named planet was active, the one named hollywood was backup.  I was watching the status page closely the whole time.

I updated the software on hollywood and stopped the service on that system.  After waiting long enough for haproxy to notice the server going down, I started it back up.  After a short time, it went to the up state (still backup).  So at this point the state is identical to the starting state.

Then I updated and stopped planet.  Understandably, haproxy noticed that planet went down.  But instead of immediately promoting hollywood to an active state as soon as planet was marked down, it waited an additional time period (which I think was about ten seconds, but I did not precisely time), and during that time period, a curl client trying to connect to the load balanced URL was receiving "no server available" messages.  Once hollywood was promoted to active, everything was good.

Because of the delay in promoting the backup server, I removed the backup keyword from the back end, and requests are now load balanced equally between both servers (in the absence of a cookie).  But I do have another haproxy setup where that is not an acceptable solution.

I'm hoping to figure out how to make a backup server transition immediately to active as soon as the primary server is marked down.  If you need additional info, please let me know.

Thanks,
Shawn


Reply via email to