On 07/04/2022 15:16, tt-admin via Exim-users wrote:
Here ist he complete strace of the hanging process:

https://pastebin.com/wPPGab1K

31032 10:47:07 wait4(-1, 0x7fff70a35a0c, WNOHANG, NULL) = 0
31032 10:47:07 select(8, [7], NULL, NULL, {tv_sec=60, tv_usec=0}) = 0 (Timeout)

This looks like 31032 is your daemon, running with a queue_interval of 60 
seconds
(and with select rather than poll, you are running with likely known
bugs active on FreeBSD).


31037 10:47:07 <... recvfrom resumed> 0x55f93be5671b, 324, 0, NULL, NULL) = ? 
ERESTARTSYS (To be restarted if SA_RESTART is set)
31032 10:47:07 --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=32665, 
si_uid=0} ---
31037 10:47:07 --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=32665, 
si_uid=0} ---

Presumably 31037 is the process of interest.  SIGUSR1 was possibly the result of
running exiwhat.

31037 10:46:58 write(9, "31037 delivering 1nZqMd-00084U-Qc to foo.bar [x.x.x.x] 
(foog@bar)\n", 105 <unfinished ...>
31037 10:46:58 rt_sigreturn({mask=[]} <unfinished ...>

consistent with exiwhat

31037 10:46:58 recvfrom(7,  <unfinished ...>

and back to waiting in read-from-network...
We can at least discount a lack-of-entropy issue.



The call from the Exim transport to the GnuTLS library gnutls_handshake()
routine is wrapped in a an alarm() call, set by the transport option 
"command_timeout".
The default for that is 5 minutes (but check your config... a setting of
two hours would probably be unwise.  You also mentioned that you've seen
5 minutes on other connections, presumably the same transport).

Seeing (via strace) the syscall *setting* that alarm might be interesting
(though I fear we'll see it being 300s and be no closer to a fix).
--
Cheers,
  Jeremy

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/

Reply via email to