Hello,

Haproxy 1.7.10 segfaults when the srv_admin_state is set to
SRV_ADMF_CMAINT (0x04)
for a backend server, and that backend has the `slowstart` option set.

The following configuration reproduces it :

-----------------------------
# haproxy.cfg (replace <path-to-state-folder> below)

global
    maxconn 30000
    user haproxy
    group haproxy
    server-state-file /<path-to-state-folder>/servers.state

    log-tag haproxy
    nbproc 1
    cpu-map 1 2
    stats socket /run/haproxy.sock level admin
    stats socket /run/haproxy_op.sock mode 666 level operator

defaults
    mode http
    option forwardfor

    option dontlognull
    option httplog
    log 127.0.0.1 local1 debug

    timeout connect 5s
    timeout client 50s
    timeout server 50s
    timeout http-request 8s

    load-server-state-from-file global
listen admin
    bind *:9002
    stats enable
    stats auth haproxyadmin:xxxxxxx

frontend testserver
        bind *:9000
        option tcp-smart-accept
        option splice-request
        option splice-response
        default_backend testservers

backend testservers
        balance roundrobin
        option tcp-smart-connect
        option splice-request
        option splice-response
        timeout server 2s
        timeout queue 2s
        default-server maxconn 10 *slowstart 10s* weight 1
        server testserver15 10.0.19.10:9003        check
        server testserver16 10.0.19.12:9003        check

        server testserver17 169.254.0.9:9003 disabled        check
        server testserver20 169.254.0.9:9003 disabled        check


# servers.state file

1
# be_id be_name srv_id srv_name srv_addr srv_op_state srv_admin_state
srv_uweight srv_iweight srv_time_since_last_change srv_check_status
srv_check_result srv_check_health srv_check_state srv_agent_state
bk_f_forced_id srv_f_forced_id
4 testservers 1 testserver15 10.0.19.10 2 0 1 1 924 6 3 4 6 0 0 0
4 testservers 2 testserver16 10.0.19.12 2 0 1 1 924 6 3 4 6 0 0 0
4 testservers 3 testserver17 169.254.0.9 0 5 1 1 924 1 0 0 14 0 0 0
4 testservers 4 testserver20 10.0.19.17 0 *4* 1 1 454 6 3 4 6 0 0 0

--------------------

The state *4* above for testserver20 causes the segfault, and only occurs
when slowstart is set.

The configuration check can reproduce it ie: haproxy -c -f haproxy.cfg

The backtrace :

(gdb) bt
#0  task_schedule (when=-508447097, task=0x0) at include/proto/task.h:244
#1  srv_clr_admin_flag (mode=SRV_ADMF_FMAINT, s=0x1fb0fd0) at
src/server.c:626
#2  srv_adm_set_ready (s=0x1fb0fd0) at include/proto/server.h:231
#3  srv_update_state (params=0x7ffe4f15e7d0, version=1, srv=0x1fb0fd0) at
src/server.c:2289
#4  apply_server_state () at src/server.c:2664
#5  0x000000000044b60f in init (argc=<optimized out>, argc@entry=4,
argv=<optimized out>,
    argv@entry=0x7ffe4f160d38) at src/haproxy.c:975
#6  0x00000000004491be in main (argc=4, argv=0x7ffe4f160d38) at
src/haproxy.c:1795


The way we use the state file is to have servers with `disabled` option in
the configuration; and during scaling update the backend address and mark
as active using the socket. The 169.254.0.9 address is a dummy address for
the disabled servers.

Can someone take a look? I couldn't find any related bugs fixed in 1.8.

Thanks
-- Raghu

Reply via email to