Screenshot for my previous post:
https://smirnov.la/Screenshot%202024-09-05%20at%2009.42.38.png

On Thu, Sep 5, 2024 at 9:52 AM Danil Smirnov <danil.smir...@gmail.com>
wrote:

> Hi,
>
> So I managed to isolate and reproduce the issue quite reliably.
>
> Every day exactly at 06:10 UTC time my dnsmasq container stops
> responding. During the event, I can successfully query my external DNS
> servers but not dnsmasq:
>
> dig domain.tld @172.18.0.250
>
>
> ; <<>> DiG 9.16.23-RH <<>> domain.tld @172.18.0.250
>
> ;; global options: +cmd
>
> ;; connection timed out; no servers could be reached
>
>
> I see hundreds of errors like this in the system log:
>
> Sep 05 06:10:58 mm4.lax.icann.org dockerd[1150]:
> time="2024-09-05T06:10:58.464185887Z" level=error msg="[resolver] failed to
> query external DNS server" client-addr="udp:172.18.0.4:48552"
> dns-server="udp:172.18.0.250:53" error="read udp 172.18.0.4:48552->
> 172.18.0.250:53: i/o timeout" question=";_dmarc.domain.tld.\tIN\t TXT"
>
>
> However, there is nothing suspicious in the /var/log/messages and
> /var/log/cron that might explain what happened.
>
>
> Before the container restarted at 06:15, I tried to collect stats via the
> "kill --signal=USR1" command but the stats weren't posted in the logs -
> obviously, dnsmasq was so stuck it couldn't even process the signal.
> (However, I don't think stats would be helpful since the time of the event
> doesn't change even if I restart dnsmasq in between 6:10 events.)
>
>
> Resource-wise, it was an increase in memory consumption by dnsmasq when
> the issue started and then a spike in the middle of it (the time shown is 3
> hours later than UTC):
>
>
> [image: Screenshot 2024-09-05 at 09.42.38.png]
>
>
>
>
> I'm using these params
> <https://github.com/dockur/dnsmasq/blob/master/entry.sh#L14> plus
> "fast-dns-retry". Also tried adding "no-negcache" and "all-servers" but
> it didn't fix the issue.
>
> Any idea where to continue the investigation?
>
> Sincerely,
> Danil Smirnov
>
>
> On Sun, Aug 25, 2024 at 7:45 PM Danil Smirnov <danil.smir...@gmail.com>
> wrote:
>
>>
>> Hi Dimitry,
>>
>> On Sun, Aug 25, 2024 at 7:36 PM Dimitry Andric <
>> dimi...@unified-streaming.com> wrote:
>>
>>> Is there any way to reproduce this issue reliably? That is, some recipe
>>> that says: run this particular docker container, run some script that
>>> queries it, observe hang after N minutes?
>>>
>>
>> For now, I established a watchdog in my environment that will restart the
>> container on freeze while collecting some stats. I'm going to monitor the
>> issue for one more week (already spent a week debugging the issue). After
>> seeing some useful data I'll try to reproduce it.
>>
>> Sincerely,
>> Danil Smirnov
>>
>
_______________________________________________
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss

Reply via email to