Screenshot for my previous post: https://smirnov.la/Screenshot%202024-09-05%20at%2009.42.38.png
On Thu, Sep 5, 2024 at 9:52 AM Danil Smirnov <danil.smir...@gmail.com> wrote: > Hi, > > So I managed to isolate and reproduce the issue quite reliably. > > Every day exactly at 06:10 UTC time my dnsmasq container stops > responding. During the event, I can successfully query my external DNS > servers but not dnsmasq: > > dig domain.tld @172.18.0.250 > > > ; <<>> DiG 9.16.23-RH <<>> domain.tld @172.18.0.250 > > ;; global options: +cmd > > ;; connection timed out; no servers could be reached > > > I see hundreds of errors like this in the system log: > > Sep 05 06:10:58 mm4.lax.icann.org dockerd[1150]: > time="2024-09-05T06:10:58.464185887Z" level=error msg="[resolver] failed to > query external DNS server" client-addr="udp:172.18.0.4:48552" > dns-server="udp:172.18.0.250:53" error="read udp 172.18.0.4:48552-> > 172.18.0.250:53: i/o timeout" question=";_dmarc.domain.tld.\tIN\t TXT" > > > However, there is nothing suspicious in the /var/log/messages and > /var/log/cron that might explain what happened. > > > Before the container restarted at 06:15, I tried to collect stats via the > "kill --signal=USR1" command but the stats weren't posted in the logs - > obviously, dnsmasq was so stuck it couldn't even process the signal. > (However, I don't think stats would be helpful since the time of the event > doesn't change even if I restart dnsmasq in between 6:10 events.) > > > Resource-wise, it was an increase in memory consumption by dnsmasq when > the issue started and then a spike in the middle of it (the time shown is 3 > hours later than UTC): > > > [image: Screenshot 2024-09-05 at 09.42.38.png] > > > > > I'm using these params > <https://github.com/dockur/dnsmasq/blob/master/entry.sh#L14> plus > "fast-dns-retry". Also tried adding "no-negcache" and "all-servers" but > it didn't fix the issue. > > Any idea where to continue the investigation? > > Sincerely, > Danil Smirnov > > > On Sun, Aug 25, 2024 at 7:45 PM Danil Smirnov <danil.smir...@gmail.com> > wrote: > >> >> Hi Dimitry, >> >> On Sun, Aug 25, 2024 at 7:36 PM Dimitry Andric < >> dimi...@unified-streaming.com> wrote: >> >>> Is there any way to reproduce this issue reliably? That is, some recipe >>> that says: run this particular docker container, run some script that >>> queries it, observe hang after N minutes? >>> >> >> For now, I established a watchdog in my environment that will restart the >> container on freeze while collecting some stats. I'm going to monitor the >> issue for one more week (already spent a week debugging the issue). After >> seeing some useful data I'll try to reproduce it. >> >> Sincerely, >> Danil Smirnov >> >
_______________________________________________ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss