On Mon, Mar 19, 2018 at 08:28:14PM +0100, William Dauchy wrote:
> On Mon, Mar 19, 2018 at 07:28:16PM +0100, Willy Tarreau wrote:
> > Threading was clearly released with an experimental status, just like
> > H2, because we knew we'd be facing some post-release issues in these
> > two areas that are hard to get 100% right at once. However I consider
> > that the situation has got much better, and to confirm this, both of
> > these are now enabled by default in HapTech's products. With this said,
> > I expect that over time we'll continue to see a few bugs, but not more
> > than what we're seeing in various areas. For example, we didn't get a
> > single issue on haproxy.org since it was updated to the 1.8.1 or so,
> > 3 months ago. So this is getting quite good.
> 
> ok it was not clear to me as being experimental since it was quite
> widely advertised in several blog posts, but I probably missed
> something.

For me, "experimental" simply means "we did our best to ensure it works
but we're realist and know that bug-free doesn't exist, so a risk remains
that a bug will be hard enough to fix so as to force you to disable the
feature for the time it takes to fix it". This issue between threads and
queue is one such example. Some of the bugs faced on H2 requiring some
heavy changes were other examples. But overall we know these features
are highly demanded and are committed to make them work fine :-)

> > Also if you're running with nbproc > 1 instead, the maxconn setting is
> > not really respected since it becomes per-process. When you run with
> > 8 processes it doesn't mean much anymore, or you need to have small
> > maxconn settings, implying that sometimes a process might queue some
> > requests while there are available slots in other processes. Thus I'd
> > argue that the threads here significantly improve the situation by
> > allowing all connection slots to be used by all CPUs, which is a real
> > improvement which should theorically show you lower latencies.
> 
> thanks for these details. We will run some tests on our side as well;
> the commit message made me worried about the last percentile of
> requests which might have crazy numbers sometimes.
> I now better understand we are speaking about 1.75 extra microseconds.

I'm still interested in knowing if you find crazy last percentile values.
We've had that a very long time ago (version 1.3 or so) when some pending
conns were accidently skipped, so I know how queues can amplify small
issues. The only real risk here in my opinion is that the sync point was
only used for health checks till now so it was running at low loads and
if it had any issue, it would likely have remained unnoticed. But the code
is small enough to be audited, and after re-reading it this afternoon I
found it fine.

> > Note that if this is of interest to you, it's trivial to make haproxy
> > run in busy polling mode, and in this case the performance increases to
> > 30900 conn/s, at the expense of eating all your CPU (which possibly you
> > don't care about if the latency is your worst ennemy). We can possibly
> > even improve this to ensure that it's done only when there are existing
> > sessions on a given thread. Let me know if this is something that could
> > be of interest to you, as I think we could make this configurable and
> > bypass the sync point in this case.
> 
> It is definitely something interesting for us to make it configurable.
> I will try to have a look as well.

If you want to run a quick test with epoll, just apply this dirty hack :

diff --git a/src/ev_epoll.c b/src/ev_epoll.c
index b98ca8c..7bafd16 100644
--- a/src/ev_epoll.c
+++ b/src/ev_epoll.c
@@ -116,7 +116,9 @@ REGPRM2 static void _do_poll(struct poller *p, int exp)
        fd_nbupdt = 0;
 
        /* compute the epoll_wait() timeout */
-       if (!exp)
+       if (1)
+               wait_time = 0;
+       else if (!exp)
                wait_time = MAX_DELAY_MS;
        else if (tick_is_expired(exp, now_ms)) {
                activity[tid].poll_exp++;

Please note that as this, it's suboptimal because it will increase the
contention on other places, causing the perfomance to be a bit lower in
certain situations. I do have some experimental code to loop on epoll
instead but it's not completely stable yet. We an exchange on this later
if you want. But feel free to apply this to your latency tests.

> > We noticed a nice performance boost on the last one with many cores
> > (24 threads, something like +40% on connection rate), but we'll probably
> > see even better once the rest is addressed.
> 
> indeed, I remember we spoke about those improvments at the last meetup.
> nice work, 1.9 looks already interesting from this point of view!

In fact 1.8's target was to get threads working and in a good enough
shape to improve on them. 1.9's target will be to have them even faster :-)

Willy

Reply via email to