[ I'm using HaProxy for 5 years but with static conf reloaded]
Due to issues with speed of discovery for backends through DNS on AWS,
I'm writing my own system to insert servers on the fly in my load balancers.
As names for backend, I'm using an taskid from my cloud provider
backend pages
timeout server 120s
option forwardfor
http-request redirect scheme https if ! { ssl_fc }
option httpchk GET /health.php
default-server inter 5s fall 3 rise 2
balance random
server bdb47d1ac9644c5f99c5e90dd4f9b944 172.31.35.239:80 weight
10 maxconn 16 check slowstart 10s
With a config build from cluster status, everything is fine (16:33:58)
When AWS/ECS send me a new task, I register it with those 3 commands (at
16:34:27):
echo "add server pages/bdb47d1ac9644c5f99c5e90dd4f9b944 172.31.35.239:80
weight 10 maxconn 32 check inter 5s fall 3 rise 2 slowstart 10s "
|netcat -w 2 172.31.33.146 9999
New server registered.
echo "enable health pages/bdb47d1ac9644c5f99c5e90dd4f9b944" |netcat -w 2
172.31.33.146 9999
echo "enable server pages/bdb47d1ac9644c5f99c5e90dd4f9b944" |netcat -w 2
172.31.33.146 9999
As you can see in the logs, servers are seen, registered and marked as
UP. But a request made a few seconds later, the backend can't find a
suitable server to fulfill the request.
Feb 7 16:33:29 ip-172-31-33-146 haproxy[42439]: [NOTICE] (42439) :
haproxy version is 2.7.2-1ppa1~jammy
Feb 7 16:33:29 ip-172-31-33-146 haproxy[42439]: [NOTICE] (42439) : path
to executable is /usr/sbin/haproxy
Feb 7 16:33:29 ip-172-31-33-146 haproxy[42439]: [NOTICE] (42439) : New
worker (42442) forked
Feb 7 16:33:29 ip-172-31-33-146 haproxy[42439]: [NOTICE] (42439) :
Loading success.
Feb 7 16:33:58 ip-172-31-33-146 haproxy[42442]: 82.66.114.242:57352
[07/Feb/2023:16:33:57.712] www~ pages/bdb47d1ac9644c5f99c5e90dd4f9b944
0/0/0/1131/1141 200 67569 - - ---- 1/1/0/0/0 0/0 "GET / HTTP/1.1"
www.XXXXXXXX Wget/1.20.3 (linux-gnu)
Feb 7 16:34:15 ip-172-31-33-146 haproxy[42442]: [WARNING] (42442) :
Server pages/bdb47d1ac9644c5f99c5e90dd4f9b944 is going DOWN for
maintenance. 0 active and 0 backup servers left. 0 sessions active, 0
requeued, 0 remaining in queue.
Feb 7 16:34:15 ip-172-31-33-146 haproxy[42442]: [ALERT] (42442) :
backend 'pages' has no server available!
Feb 7 16:34:15 ip-172-31-33-146 haproxy[42442]: Server
pages/bdb47d1ac9644c5f99c5e90dd4f9b944 is going DOWN for maintenance. 0
active and 0 backup servers left. 0 sessions active, 0 requeued, 0
remaining in queue.
Feb 7 16:34:15 ip-172-31-33-146 haproxy[42442]: backend pages has no
server available!
Feb 7 16:34:20 ip-172-31-33-146 haproxy[42442]: [NOTICE] (42442) :
Server deleted.
Feb 7 16:34:27 ip-172-31-33-146 haproxy[42442]: [NOTICE] (42442) : CLI
: 'server pages/bdb47d1ac9644c5f99c5e90dd4f9b944' : New server registered.
Feb 7 16:34:40 ip-172-31-33-146 haproxy[42442]: [WARNING] (42442) :
Server pages/bdb47d1ac9644c5f99c5e90dd4f9b944 is UP/READY (leaving
forced maintenance).
Feb 7 16:34:40 ip-172-31-33-146 haproxy[42442]: Server
pages/bdb47d1ac9644c5f99c5e90dd4f9b944 is UP/READY (leaving forced
maintenance).
Feb 7 16:34:50 ip-172-31-33-146 haproxy[42442]: [WARNING] (42442) :
Server pages/bdb47d1ac9644c5f99c5e90dd4f9b944 is UP. 1 active and 0
backup servers online. 0 sessions requeued, 0 total in queue.
Feb 7 16:34:50 ip-172-31-33-146 haproxy[42442]: Server
pages/bdb47d1ac9644c5f99c5e90dd4f9b944 is UP. 1 active and 0 backup
servers online. 0 sessions requeued, 0 total in queue.
Feb 7 16:35:16 ip-172-31-33-146 haproxy[42442]: 82.66.114.242:36698
[07/Feb/2023:16:35:01.250] www~ pages/<NOSRV> 0/15001/-1/-1/15001 503
4793 - - sQ-- 1/1/0/0/0 0/1 "GET / HTTP/1.1" www.XXXXXXX Wget/1.20.3
(linux-gnu)
The servers state is like this:
echo "show servers state pages" |netcat -w 2 172.31.33.146 9999
1
# be_id be_name srv_id srv_name srv_addr srv_op_state srv_admin_state
srv_uweight srv_iweight srv_time_since_last_change srv_check_status
srv_check_result srv_check_health srv_check_state srv_agent_state
bk_f_forced_id srv_f_forced_id srv_fqdn srv_port srvrecord srv_use_ssl
srv_check_port srv_check_addr srv_agent_addr srv_agent_port
5 pages 1 bdb47d1ac9644c5f99c5e90dd4f9b944 172.31.35.239 2 0 10 10 2087
15 3 4 6 0 0 0 - 80 - 0 0 - - 0
srv_check_result is 3 which indicates the healthchecks are fine.
I'm a bit baffled by the situation. If someone has a bit more experience
in inserting backends on the fly with L7 checks, i'll be gratefull.
--
Thomas Pedoussaut