> What value did you use? I went brute force and used 1M. The default on this arm based device was also 212992.
root@MH7601:~# cat /proc/sys/net/core/wmem_default 1048576 I agree that is a lot but given the arp queue length has 101 entries, that is a lot of packets (especially if that might mean 101 hosts - not sure if the arp/neigh queue is per host or per request). root@MH7601:~# cat /proc/sys/net/ipv4/neigh/default/unres_qlen 101 This is a very controlled environment, there are only about 30 sockets open at any time. This approach won't suit most people but it saved me from crafting a patch into openwrt. Thanks, Tom On Mon, May 16, 2022 at 10:52 AM Simon Kelley <si...@thekelleys.org.uk> wrote: > What value did you use? > > On my Ubuntu desktop, /proc/sys/net/core/wmem_default and wmem_max are > both 212992 which is a fair few DNS replies. > > > Simon. > > > On 16/05/2022 18:34, Tom Keddie wrote: > > Hi Simon, > > > > Thanks for your response. I don't have the detailed logs but it's a > > noisy qa wireless environment where clients are coming and going a lot. > > eg. In syslog I could see instances where we would get a DHCP request > > and then a L2 wireless disassociate message would appear immediately > > afterwards, that response isn't going to be deliverable as unicast > > (although for dhcp it might fall back to broadcast eventually). > > > > As we know, DNS isn't logged in such a manner but you could see the same > > scenario unfolding where we get a bunch of dns requests, the client > > drops off immediately afterwards and the responses can't be delivered. > > When there's a lot of requests or a lot of clients you can see how the > > socket buffer would fill. > > > > Increasing the socket buffers as I described below allowed the test to > > run for the required 96 hours, without it we weren't making it past the > > 48 hour mark. > > > > A dynamic solution might work provided it was carefully bound to prevent > > DoS. If you have something you'd like us to test I probably arrange a > > time slot, it's a busy setup that needs lots of hardware though. > > > > Thanks, > > Tom Keddie > > > > ps. this is a controlled environment (as much as you can control wifi), > > there are no malicious actors nor intent in this scenario. It's a soak > > test with a large variety of clients all doing busy work like video > > streaming etc. > > > > > > On Fri, May 13, 2022 at 12:48 PM Simon Kelley <si...@thekelleys.org.uk > > <mailto:si...@thekelleys.org.uk>> wrote: > > > > > > > > On 10/05/2022 16:40, Tom Keddie via Dnsmasq-discuss wrote: > > > Hi All, > > > > > > I think you're saying that it's not surprising that dnsmasq > > is not > > > reading from the socket because the send queue is also full. > > > > > > > > > As per this thread on netdev > > > > > ( > https://lore.kernel.org/netdev/cabuuw65r3or9hehsmt_isvx1f-7b6ecppdr+bnr6f6wbkpn...@mail.gmail.com/ > > < > https://lore.kernel.org/netdev/cabuuw65r3or9hehsmt_isvx1f-7b6ecppdr+bnr6f6wbkpn...@mail.gmail.com/ > > > > > > > > > < > https://lore.kernel.org/netdev/cabuuw65r3or9hehsmt_isvx1f-7b6ecppdr+bnr6f6wbkpn...@mail.gmail.com/ > > < > https://lore.kernel.org/netdev/cabuuw65r3or9hehsmt_isvx1f-7b6ecppdr+bnr6f6wbkpn...@mail.gmail.com/ > >>) > > > > > it seems we were consuming the socket send buffer with pending > > packets > > > waiting for ARP responses that were never coming. This was > causing > > > failures sending to devices that were still live. > > > > > > As per that thread we increased the > /proc/sys/net/core/wmem_default > > > value so all sockets will have larger send buffers (the device > > has very > > > few sockets in use). It might be useful to add dnsmasq config > > options to > > > increase SO_SNDBUF on the dhcp and dns sockets to allow more > > granular > > > control. > > > > > > Thanks, Tom Keddie > > > > So queries are being received, and answered, but the reply is being > > dropped by the kernel because the send queue is full of replies to > dead > > hosts? If the hosts are dead, where are the queries coming from to > > generate these blocked replies? > > > > It might be sensible to automatically increase the send queue length > > when a packer send gets EAGAIN. at least the first time, but I'd > > like to > > understand exactly what's going on first. > > > > > > Simon. > > > > > > > > _______________________________________________ > > > Dnsmasq-discuss mailing list > > > Dnsmasq-discuss@lists.thekelleys.org.uk > > <mailto:Dnsmasq-discuss@lists.thekelleys.org.uk> > > > > > > https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss > > < > https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss> > > >
_______________________________________________ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss