What value did you use?
On my Ubuntu desktop, /proc/sys/net/core/wmem_default and wmem_max are
both 212992 which is a fair few DNS replies.
Simon.
On 16/05/2022 18:34, Tom Keddie wrote:
Hi Simon,
Thanks for your response. I don't have the detailed logs but it's a
noisy qa wireless environment where clients are coming and going a lot.
eg. In syslog I could see instances where we would get a DHCP request
and then a L2 wireless disassociate message would appear immediately
afterwards, that response isn't going to be deliverable as unicast
(although for dhcp it might fall back to broadcast eventually).
As we know, DNS isn't logged in such a manner but you could see the same
scenario unfolding where we get a bunch of dns requests, the client
drops off immediately afterwards and the responses can't be delivered.
When there's a lot of requests or a lot of clients you can see how the
socket buffer would fill.
Increasing the socket buffers as I described below allowed the test to
run for the required 96 hours, without it we weren't making it past the
48 hour mark.
A dynamic solution might work provided it was carefully bound to prevent
DoS. If you have something you'd like us to test I probably arrange a
time slot, it's a busy setup that needs lots of hardware though.
Thanks,
Tom Keddie
ps. this is a controlled environment (as much as you can control wifi),
there are no malicious actors nor intent in this scenario. It's a soak
test with a large variety of clients all doing busy work like video
streaming etc.
On Fri, May 13, 2022 at 12:48 PM Simon Kelley <si...@thekelleys.org.uk
<mailto:si...@thekelleys.org.uk>> wrote:
On 10/05/2022 16:40, Tom Keddie via Dnsmasq-discuss wrote:
> Hi All,
>
> I think you're saying that it's not surprising that dnsmasq
is not
> reading from the socket because the send queue is also full.
>
>
> As per this thread on netdev
>
(https://lore.kernel.org/netdev/cabuuw65r3or9hehsmt_isvx1f-7b6ecppdr+bnr6f6wbkpn...@mail.gmail.com/
<https://lore.kernel.org/netdev/cabuuw65r3or9hehsmt_isvx1f-7b6ecppdr+bnr6f6wbkpn...@mail.gmail.com/>
>
<https://lore.kernel.org/netdev/cabuuw65r3or9hehsmt_isvx1f-7b6ecppdr+bnr6f6wbkpn...@mail.gmail.com/
<https://lore.kernel.org/netdev/cabuuw65r3or9hehsmt_isvx1f-7b6ecppdr+bnr6f6wbkpn...@mail.gmail.com/>>)
> it seems we were consuming the socket send buffer with pending
packets
> waiting for ARP responses that were never coming. This was causing
> failures sending to devices that were still live.
>
> As per that thread we increased the /proc/sys/net/core/wmem_default
> value so all sockets will have larger send buffers (the device
has very
> few sockets in use). It might be useful to add dnsmasq config
options to
> increase SO_SNDBUF on the dhcp and dns sockets to allow more
granular
> control.
>
> Thanks, Tom Keddie
So queries are being received, and answered, but the reply is being
dropped by the kernel because the send queue is full of replies to dead
hosts? If the hosts are dead, where are the queries coming from to
generate these blocked replies?
It might be sensible to automatically increase the send queue length
when a packer send gets EAGAIN. at least the first time, but I'd
like to
understand exactly what's going on first.
Simon.
>
> _______________________________________________
> Dnsmasq-discuss mailing list
> Dnsmasq-discuss@lists.thekelleys.org.uk
<mailto:Dnsmasq-discuss@lists.thekelleys.org.uk>
>
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss
<https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss>
_______________________________________________
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss