On Mon, Sep 11, 2023 at 08:39:30PM +0200, Christopher Faulet wrote: > Le 11/09/2023 à 15:12, Cedric Paillet a écrit : > > This adds a new 'FULL' status to the Prometheus metric 'srv_state'. It > > helps identify servers that have exceeded their maxconn limit and cannot > > accept new connections. > > Rename server_has_room to !server_is_full to matches what's used at a few > > places in the doc in association with servers or backends being "full". > > Thanks Cedric, > > I have no real opinion on the way the FULL state is detected, I will let > Willy react :)
Yeah the method is correct. > However, to be merged, the patch should be split, to first rename the > function and then change the metric into the Prometheus exporter. This will > simplify reviews on bugs and ease eventual reverts. > > But I'm a bit annoyed to hijack the servers check status this way. First > because, this change the promex metrics and not everyone is interested by > this info and may want to keep the "UP" status. That's a fair point, I didn't realize it was going to replace the UP status. I seem to remember that we're having UP/something in some state in the stats, I don't remember which one :-/ > Then I'm also annoyed to > have a difference between the stats applet and the promex one. The first one > will still report a server as "UP" while the second one will be able to > report it as "FULL". In fact we're already having something very similar for listeners (OPEN, FULL etc) but it has been there for ages, and I agree that here we might start to make some monitoring fail because when the server is used at full capacity it will report FULL without indicating if it's UP or DOWN. Maybe just "UP/FULL" and "DOWN/FULL" could be OK. I'm just having a doubt, I suspect that the "UP/something" I was thinking about was probably about the DRAIN state, and here it's again orthogonal. > I guess it could be a new server metric. A boolean to report it as full or > not. And an aggregate metric can also be added for the backends. But we > should be careful about performances. It could be good to think to merge the > processing of aggregated metrics to not loop on the server list several > times. Finally, if a new Prometheus metric is added for servers, it could be > good to see if it could/should be added on the stats page too. I agree and am not much worried for the stats page, it could be in a tooltip over the state for example. We just need to make sure we report something accurate and not conflicting with another state. Willy

