What value did you use?

On my Ubuntu desktop, /proc/sys/net/core/wmem_default and wmem_max are both 212992 which is a fair few DNS replies.


Simon.


On 16/05/2022 18:34, Tom Keddie wrote:
Hi Simon,

Thanks for your response.  I don't have the detailed logs but it's a noisy qa wireless environment where clients are coming and going a lot. eg. In syslog I could see instances where we would get a DHCP request and then a L2 wireless disassociate message would appear immediately afterwards, that response isn't going to be deliverable as unicast (although for dhcp it might fall back to broadcast eventually).

As we know, DNS isn't logged in such a manner but you could see the same scenario unfolding where we get a bunch of dns requests, the client drops off immediately afterwards and the responses can't be delivered. When there's a lot of requests or a lot of clients you can see how the socket buffer would fill.

Increasing the socket buffers as I described below allowed the test to run for the required 96 hours, without it we weren't making it past the 48 hour mark.

A dynamic solution might work provided it was carefully bound to prevent DoS.  If you have something you'd like us to test I probably arrange a time slot, it's a busy setup that needs lots of hardware though.

Thanks,
Tom Keddie

ps. this is a controlled environment (as much as you can control wifi), there are no malicious actors nor intent in this scenario.  It's a soak test with a large variety of clients all doing busy work like video streaming etc.


On Fri, May 13, 2022 at 12:48 PM Simon Kelley <si...@thekelleys.org.uk <mailto:si...@thekelleys.org.uk>> wrote:



    On 10/05/2022 16:40, Tom Keddie via Dnsmasq-discuss wrote:
     > Hi All,
     >
     >     I think you're saying that it's not surprising that dnsmasq
    is not
     >     reading from the socket because the send queue is also full.
     >
     >
     > As per this thread on netdev
     >
    
(https://lore.kernel.org/netdev/cabuuw65r3or9hehsmt_isvx1f-7b6ecppdr+bnr6f6wbkpn...@mail.gmail.com/
    
<https://lore.kernel.org/netdev/cabuuw65r3or9hehsmt_isvx1f-7b6ecppdr+bnr6f6wbkpn...@mail.gmail.com/>

     >
    
<https://lore.kernel.org/netdev/cabuuw65r3or9hehsmt_isvx1f-7b6ecppdr+bnr6f6wbkpn...@mail.gmail.com/
    
<https://lore.kernel.org/netdev/cabuuw65r3or9hehsmt_isvx1f-7b6ecppdr+bnr6f6wbkpn...@mail.gmail.com/>>)

     > it seems we were consuming the socket send buffer with pending
    packets
     > waiting for ARP responses that were never coming.  This was causing
     > failures sending to devices that were still live.
     >
     > As per that thread we increased the /proc/sys/net/core/wmem_default
     > value so all sockets will have larger send buffers (the device
    has very
     > few sockets in use). It might be useful to add dnsmasq config
    options to
     > increase SO_SNDBUF on the dhcp and dns sockets to allow more
    granular
     > control.
     >
     > Thanks, Tom Keddie

    So queries are being received, and answered, but the reply is being
    dropped by the kernel because the send queue is full of replies to dead
    hosts? If the hosts are dead, where are the queries coming from to
    generate these blocked replies?

    It might be sensible to automatically increase the send queue length
    when a packer send gets EAGAIN. at least the first time, but I'd
    like to
    understand exactly what's going on first.


    Simon.

     >
     > _______________________________________________
     > Dnsmasq-discuss mailing list
     > Dnsmasq-discuss@lists.thekelleys.org.uk
    <mailto:Dnsmasq-discuss@lists.thekelleys.org.uk>
     >
    https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss
    <https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss>


_______________________________________________
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss

Reply via email to