Greetings Willy,

> On Sep 3, 2024, at 11:48 PM, Willy Tarreau <w...@1wt.eu> wrote:
> 
> Hello Aaron,
> 
> On Tue, Sep 03, 2024 at 04:24:57PM -0400, Aaron Kuehler wrote:
>> Allow the user to set the "initial state" of a server.
> 
> Thank you very much for working on this one!

Thank you for the swift and clueful feedback, and kind words. Happy to have a 
go. I’m a bit out of my element, and grateful for the support. 

> 
>> Context:
>> 
>> Servers are always set in an UP status by default. In
>> some cases, further checks are required to determine if the server is
>> ready to receive client traffic.
>> 
>> This introduces the "init-state {up|down}" configuration parameter to
>> the server.
>> 
>> - when not set, the server is considered available as soon as a connection
>>  can be established.
>> - when set to 'up', the server is considered available as soon as a
>>  connection can be established.
>> - when set to 'down', the server is initially considered unavailable until
>>  it successfully completes its health checks.
> 
> I'm having a difficulty understanding the first two cases, and it seems
> from the patch that they are in fact the same. I think that "as soon as
> a connection can be established" is confusing as it makes one think that
> it still needs to perform one check (otherwise it's not clear what this
> "connection" refers to).
> 
> Thus I suggest that the first two lines are merged into one with an
> explanation like this one:
> 
>  - when set to 'up' (the default), the server is considered immediately
>    available and will initiate a health check that can turn it to the DOWN
>    state immediately if it fails.
> 
> And the same could be done in the doc, which contains the same text.
> 

You’re 100% correct, the first two cases have the same behavior, and I agree my 
initial wording is confusing. I like the direction in which your suggestion is 
pointed and I’ll incorporate it into the commit message and documentation. 

>> The server's init-state is considered when the HAProxy instance
>> is (re)started, a new server is detected (for example via service
>> discovery / DNS resolution), a server exits maintenance, etc.
> 
> You're right, it's important to mention this because it's far from being
> obvious.

Great point. I’ll also include this in the user-facing, configuration 
documentation. 

> 
> By analogy with the "up" state, the "down" should be at check_rise-1 
> instead of zero I think, so that it's sufficient to succeed one check
> to turn it on. Otherwise it can take a lot of time to introduce some
> servers which are slowly checked.
> 
> I'm just thinking about something, since I've read that complaint as well
> a few times: in some environments, some servers are having difficulties
> responding positively to initial health checks, and due to the way we're
> starting at the check_rise value, the first failure is sufficient to turn
> it down. A few users already asked for a way to turn the server completely
> up. I think there could be value in having 3 different init states:
>  - 'down': needs to validate one check before turning up
>  - 'probe': the current situation, where it's up until the first test
>    makes it fail
>  - 'up': it starts fully up (check_rise+check_fall-1 IIRC).
> 
> Or maybe even 4 states:
>  - "fully-down": 0, requires that all checks are valid before it turns up
>  - "down": check_rise-1, requires one successful check to turn up
>  - "up": check_rise, requires one faulty check to turn down
>  - "fully-up": check_rise+check_fall-1, requires that all checks fail to
>    turn down.
> 
> And we'd default to "up" like in your patch.
> 
> What do you think ?

I’m no expert, but the latter sounds slightly more flexible. I’ll make this 
change. 

> 
> BTW your patch is super clean, I've only found one tiny thing (just to
> say that I found something):
> 
>> @@ -6504,7 +6526,11 @@ static int _srv_update_status_adm(struct server *s, 
>> enum srv_adm_st_chg_cause ca
>>               */
>>              if (s->check.state & CHK_ST_ENABLED) {
>>                      s->check.state &= ~CHK_ST_PAUSED;
>> -                    s->check.health = s->check.rise; /* start OK but check 
>> immediately */
>> +                    if(s->init_state == SRV_INIT_STATE_DOWN) {
>                         ^^
> missing space here. :-)
> 
>> +                            s->check.health = 0; /* checks must pass before 
>> the server is considered UP */
>> +                    } else {
>> +                            s->check.health = s->check.rise; /* start OK 
>> but check immediately */
>> +                    }
>>              }
> 

Thank you for the kind words, and keen eyes. I’ll fix this, promptly.

Cheers, 
Aaron



Reply via email to