If it's any help, here's `show quic full` for a stalled connection:
* 0x7f04a0bb01c0[00]: scid=6ff4eed3a45373ea........................
dcid=........................................
loc. TPs: odcid=b8bf539a98f591d5 iscid=6ff4eed3a45373ea
midle_timeout=30000ms mudp_payload_sz=2048 ack_delay_exp=3 mack_delay=25ms
act_cid_limit=8
md=1687140 msd_bidi_l=16380 msd_bidi_r=16380 msd_uni=16380 ms_bidi=100
ms_uni=3
(no_act_migr,stless_rst_tok)
rem. TPs: iscid=
midle_timeout=0ms mudp_payload_sz=65527 ack_delay_exp=3 mack_delay=25ms
act_cid_limit=64
md=16777216 msd_bidi_l=2097152 msd_bidi_r=2097152 msd_uni=2097152 ms_bidi=0
ms_uni=103
st=opened mux=ready expire=19s
fd=-1 local_addr=139.162.161.165:443 foreign_addr=<ip>:49206
[initl] rx.ackrng=1 tx.inflight=0 [hndshk]
rx.ackrng=1 tx.inflight=0
[01rtt] rx.ackrng=1 tx.inflight=0
srtt=19 rttvar=2 rttmin=18 ptoc=0 cwnd=17233251 mcwnd=17233251
sentpkts=15017 lostpkts=0
sblockebidi=1
| stream=3 off=0 ack=17 | stream=548 off=0
ack=0 | stream=552 off=0 ack=0
| stream=560 off=0 ack=7853 | stream=564 off=0
ack=0 | stream=568 off=0 ack=0
| stream=572 off=0 ack=0 | stream=576 off=0
ack=0 | stream=580 off=0 ack=0
| stream=584 off=0 ack=0 | stream=588 off=0
ack=0 | stream=592 off=0 ack=0
| stream=600 off=0 ack=0 | stream=604 off=0
ack=0 | stream=608 off=0 ack=0
| stream=612 off=0 ack=0 | stream=616 off=0
ack=0 | stream=620 off=0 ack=0
| stream=624 off=0 ack=0 | stream=628 off=0
ack=0 | stream=632 off=0 ack=0
| stream=636 off=0 ack=0 | stream=640 off=0
ack=0 | stream=644 off=0 ack=0
| stream=648 off=0 ack=0 | stream=652 off=0
ack=0 | stream=656 off=0 ack=0
| stream=660 off=0 ack=0 | stream=664 off=0
ack=0 | stream=668 off=0 ack=0
| stream=672 off=0 ack=0 | stream=676 off=0
ack=0 | stream=680 off=0 ack=0
| stream=684 off=0 ack=0 | stream=688 off=0
ack=0 | stream=692 off=0 ack=0
| stream=696 off=0 ack=0 | stream=700 off=0
ack=0 | stream=704 off=0 ack=0
| stream=708 off=0 ack=0 | stream=712 off=0
ack=0 | stream=716 off=0 ack=0
| stream=720 off=0 ack=0 | stream=724 off=0
ack=0 | stream=728 off=0 ack=0
| stream=732 off=0 ack=0 | stream=736 off=0
ack=0 | stream=740 off=0 ack=0
| stream=744 off=0 ack=0 | stream=748 off=0
ack=0 | stream=752 off=0 ack=0
| stream=756 off=0 ack=0 | stream=760 off=0
ack=0 | stream=764 off=0 ack=0
| stream=768 off=0 ack=0 | stream=772 off=0
ack=0 | stream=776 off=0 ack=0
| stream=780 off=0 ack=0 | stream=784 off=0
ack=0 | stream=788 off=0 ack=0
| stream=792 off=0 ack=0 | stream=796 off=0
ack=0 | stream=800 off=0 ack=0
| stream=804 off=0 ack=0
HAProxy thinks everything is good…the client doesn't see responses.
Best,
Luke
—
Luke Seelenbinder
Stadia Maps | Founder & CEO
stadiamaps.com
> On Sep 22, 2023, at 15:05, Luke Seelenbinder
> <[email protected]> wrote:
>
> Hi Tristan,
>
> I actually read your thread a few times last night. It was helpful, but I
> couldn't isolate anything I was seeing to match what you're seeing (but it's
> entirely likely there could be overlap!).
>
>> Out of curiosity, what version of HAProxy are you running?
>
> We're currently running latest 2.8.3, compiled with quictls 1.1.1.
>
>> I don’t have a magic answer but in
>> https://github.com/haproxy/haproxy/issues/2095#issuecomment-1570547484 we’ve
>> been looking at performance issues with H3/QUIC over time, and there’s a
>> couple of workarounds currently relevant (~ from the comment linked),
>> notably to use a short timeout connect (currently using 500ms here), and to
>> comment out timeout client-fin.
>
> That looks interesting. Our current timeout values are:
>
> # connect [to a backend server]
> timeout connect 12400ms
> # wait for a full request [from a client]
> timeout http-request 9300ms
> # wait for any data from the server (time to first byte/header)
> timeout server 60s
> # wait for the client (keep-alive for h1.1, client for h1.1 & h2)
> timeout http-keep-alive 15m
> timeout client 15m
>
> Which would also apply to H2 & H/1.1, so I'm hesitant to modify them too
> significantly.
>
>> I see you explicitly set per-connection socket ownership, which is
>> definitely the right thing to do, but you should check that it is actually
>> working « for real » and not silently disabled due to permissions somewhere
>> inside the worker processes.
>> To do that, have a look at the output of ‘show fd’ and check that H3 conns
>> aren’t all with fd=-1 (instead it should be a different positive fd number
>> per H3 conn)
>
> That's really good advice—I almost tried setcap last night, but assumed it
> would throw errors if it wasn't working…I'll try that next.
>
> https://jsfiddle.net/vqn3sea7/show
> Here's the JSFiddle. I can consistently trigger it in Safari on iOS by
> zooming in a few times and then swiping around fairly quickly (ends up
> queueing a lot of requests and eventually hanging forever.
>
> Thanks for the help! I'll see what I can turn up…
>
> Best,
> Luke
>
> —
> Luke Seelenbinder
> Stadia Maps | Founder & CEO
> stadiamaps.com
>
>> On Sep 22, 2023, at 14:56, Tristan <[email protected]> wrote:
>>
>> Hi Luke,
>>
>>> Under some conditions (especially high request rate), H3 connections simply
>>> stall. This appears to be more prevalent in some browsers than others
>>> (Safari on iOS is the worst offender), and appears to coincide with high
>>> packet_loss in `show quic`.
>>
>>
>> Out of curiosity, what version of HAProxy are you running?
>>
>> I don’t have a magic answer but in
>> https://github.com/haproxy/haproxy/issues/2095#issuecomment-1570547484 we’ve
>> been looking at performance issues with H3/QUIC over time, and there’s a
>> couple of workarounds currently relevant (~ from the comment linked),
>> notably to use a short timeout connect (currently using 500ms here), and to
>> comment out timeout client-fin.
>>
>> I see you explicitly set per-connection socket ownership, which is
>> definitely the right thing to do, but you should check that it is actually
>> working « for real » and not silently disabled due to permissions somewhere
>> inside the worker processes.
>> To do that, have a look at the output of ‘show fd’ and check that H3 conns
>> aren’t all with fd=-1 (instead it should be a different positive fd number
>> per H3 conn)
>>
>> I’d definitely like to see the JSFiddle if there’s a trick to the kind of
>> request affected to see if I can reproduce it on our nodes.
>>
>> Regards,
>> Tristan
>