Greetings Willy, > On Sep 3, 2024, at 11:48 PM, Willy Tarreau <w...@1wt.eu> wrote: > > Hello Aaron, > > On Tue, Sep 03, 2024 at 04:24:57PM -0400, Aaron Kuehler wrote: >> Allow the user to set the "initial state" of a server. > > Thank you very much for working on this one!
Thank you for the swift and clueful feedback, and kind words. Happy to have a go. I’m a bit out of my element, and grateful for the support. > >> Context: >> >> Servers are always set in an UP status by default. In >> some cases, further checks are required to determine if the server is >> ready to receive client traffic. >> >> This introduces the "init-state {up|down}" configuration parameter to >> the server. >> >> - when not set, the server is considered available as soon as a connection >> can be established. >> - when set to 'up', the server is considered available as soon as a >> connection can be established. >> - when set to 'down', the server is initially considered unavailable until >> it successfully completes its health checks. > > I'm having a difficulty understanding the first two cases, and it seems > from the patch that they are in fact the same. I think that "as soon as > a connection can be established" is confusing as it makes one think that > it still needs to perform one check (otherwise it's not clear what this > "connection" refers to). > > Thus I suggest that the first two lines are merged into one with an > explanation like this one: > > - when set to 'up' (the default), the server is considered immediately > available and will initiate a health check that can turn it to the DOWN > state immediately if it fails. > > And the same could be done in the doc, which contains the same text. > You’re 100% correct, the first two cases have the same behavior, and I agree my initial wording is confusing. I like the direction in which your suggestion is pointed and I’ll incorporate it into the commit message and documentation. >> The server's init-state is considered when the HAProxy instance >> is (re)started, a new server is detected (for example via service >> discovery / DNS resolution), a server exits maintenance, etc. > > You're right, it's important to mention this because it's far from being > obvious. Great point. I’ll also include this in the user-facing, configuration documentation. > > By analogy with the "up" state, the "down" should be at check_rise-1 > instead of zero I think, so that it's sufficient to succeed one check > to turn it on. Otherwise it can take a lot of time to introduce some > servers which are slowly checked. > > I'm just thinking about something, since I've read that complaint as well > a few times: in some environments, some servers are having difficulties > responding positively to initial health checks, and due to the way we're > starting at the check_rise value, the first failure is sufficient to turn > it down. A few users already asked for a way to turn the server completely > up. I think there could be value in having 3 different init states: > - 'down': needs to validate one check before turning up > - 'probe': the current situation, where it's up until the first test > makes it fail > - 'up': it starts fully up (check_rise+check_fall-1 IIRC). > > Or maybe even 4 states: > - "fully-down": 0, requires that all checks are valid before it turns up > - "down": check_rise-1, requires one successful check to turn up > - "up": check_rise, requires one faulty check to turn down > - "fully-up": check_rise+check_fall-1, requires that all checks fail to > turn down. > > And we'd default to "up" like in your patch. > > What do you think ? I’m no expert, but the latter sounds slightly more flexible. I’ll make this change. > > BTW your patch is super clean, I've only found one tiny thing (just to > say that I found something): > >> @@ -6504,7 +6526,11 @@ static int _srv_update_status_adm(struct server *s, >> enum srv_adm_st_chg_cause ca >> */ >> if (s->check.state & CHK_ST_ENABLED) { >> s->check.state &= ~CHK_ST_PAUSED; >> - s->check.health = s->check.rise; /* start OK but check >> immediately */ >> + if(s->init_state == SRV_INIT_STATE_DOWN) { > ^^ > missing space here. :-) > >> + s->check.health = 0; /* checks must pass before >> the server is considered UP */ >> + } else { >> + s->check.health = s->check.rise; /* start OK >> but check immediately */ >> + } >> } > Thank you for the kind words, and keen eyes. I’ll fix this, promptly. Cheers, Aaron