On Mon, 19 Oct 2020 at 10:00, Christopher Faulet <[email protected]> wrote: > > Le 16/10/2020 à 10:04, Christopher Faulet a écrit : > > Le 13/10/2020 à 14:53, Peter Statham a écrit : > >> Hello, > >> > >> We've found an issue when using agent checks in conjunction with the > >> weighted > >> least connections algorithm in multithreaded mode. It seems to me as if > >> it is > >> possible for next_eweight in struct server to be modified in another thread > >> during the execution of fwlc_srv_reposition. If next_eweight is set to > >> zero > >> then a division by zero occurs on line 59 in src/lb_fwlc.c in > >> fwlc_queue_srv. > >> > >> I notice that in haproxy-2.0.18 this section of code is protected by > >> HA_SPINLOCKs and I've been unable to replicate this issue in that version. > >> > >> I've written an agent (attached), bad_agent.py, which provokes this > >> condition by > >> switching randomly between 1 and 0 percent. I also include a minimal > >> configuration, cfg (also attached), which seems sufficient to cause the > >> issue. > >> With these two running “ab -n 5000000 -c 500 http://192.168.92.1:8080/” > >> will > >> quickly crash the haproxy process. > >> > >> I include links to a coredump and the binary that generated it > >> (unstripped). > >> The backtrace of thread 1 follows. > >> > > > > Hi, > > > > Thanks for the reproducer. I'm able to crash HAProxy too using your config > > and > > your agent. It seems to only crash on the 1.8. I'll investigate. > > > > Hi, > > In fact, it fails in all branches supporting the threads. The leasconn and > first > loadbalancing algorithms are affected by this bug. In leastconn, it may crash > because of the division by 0 when the server weight is set to 0. But for the > both algos, the server tree may be also corrupted, leading to stranger and > undefined bugs. > > I pushed a fix (commit 26a52a) and backported it as far as 1.8. So, it should > be > fixed in all branches now. > > Thanks ! > -- > Christopher Faulet
Thank you for making a patch for this bug, Christopher. I've checked out the 1.8 master (I would have done so sooner, but I'm afraid I didn't have access to my email last week) and I'm happy to say I can't replicate the crash. :) -- Peter Statham

