> On Feb 17, 2022, at 11:41 PM, Evgeniy Berdnikov via Exim-users > <[email protected]> wrote: > > On Thu, Feb 17, 2022 at 07:36:38PM -0800, Michael Tratz via Exim-users wrote: >>> On Feb 16, 2022, at 4:17 PM, Jeremy Harris via Exim-users >>> <[email protected]> wrote: >>> >>> You don't even get a single line from truss as it attaches? >>> I wonder if the process is spinning in userland? >>> Does "top" or similar show it? >> >> That stuck process is just sitting there and not doing anything. It still >> shows in top, but it’s just idle. > > Process can't be "just idle", it must have its state and data structures, > the most interesting is stack. Process state may displayed with "ps wchan": > > ps -p <pid> -o pid,wchan,cmd
PID WCHAN COMMAND 52656 sbwait /usr/local/sbin/exim -Mc 1nNo2G-000Dgw-E1 All those processes are stuck in sbwait > > Stack may be printed by debugger: > > gdb -p <pid> -f /path/to/exim > (gdb) bt full > Here is the debugger output: https://pastebin.com/ZbbdmpF2 <https://pastebin.com/ZbbdmpF2> Some line numbers don’t match up with the original exim 4.95 release due to patching in the FreeBSD port. For: #12 0x0000000000341a93 in tls_close (ct_ctx=0x8014ac208, do_shutdown=2) at ./tls-openssl.c:4400 The correct line number is 4393 in the exim 4.95 release: https://github.com/Exim/exim/blob/fb62e7a12be6593a5432fba4a9e4468c34feef5c/src/src/tls-openssl.c#L4393 <https://github.com/Exim/exim/blob/7980dd8917020521479f2bb28a2363e76fb551e2/src/src/tls-openssl.c#L4530> And: #13 0x0000000000394a4f in smtp_deliver (addrlist=0x80142b4a0, host=0x80142be60, host_af=2, defport=25, interface=0x0, tblock=0x801452368, message_defer=0x7fffffffc1a4, suppress_tls=0) at smtp.c:4808 https://github.com/Exim/exim/blob/fb62e7a12be6593a5432fba4a9e4468c34feef5c/src/src/transports/smtp.c#L4819 I finally had some spare time to try to look further into those stuck exim processes on FreeBSD last week. I had a nice list of remote smtp servers which always caused the issue no matter if they sent a 5xx or 2xx. I tried using a more recent git version of exim and also compiled exim on FreeBSD current. Using a default configure file etc. None fixed the issue. The only thing which helped was using GnuTLS instead of openssl. I rebuilt the port and OS with debugging symbols. As soon as the SSL_shutdown function is called the process would not shutdown anymore for certain hosts. Google had some results of processes getting “stuck” with SSL_shutdown but I’m not that familiar with openssl. After some research it looks like the following commit 001bf8f587 Pipeline QUIT after data in src/src/transports/smtp.c introduced the bug line 4558 for that commit: tls_close(sx->cctx.tls_ctx, TLS_SHUTDOWN_WAIT); The tls_shutdown_wr function was also introduced in that commit. It also calls SSL_shutdown. Once tls_close is used after tls_shutdown_wr. The first SSL_shutdown in src/tls-openssl.c causes the process getting stuck for some remote hosts. The hosts which seemed getting stuck were not using pipelining. So I also tried hosts_avoid_pipelining = * for hosts which don’t have the issue, but I couldn’t get the exim process to get stuck. I don’t know why the issue happens with only certain remote smtp servers. I have added the following patch: diff --git a/src/src/transports/smtp.c b/src/src/transports/smtp.c index 6a979a243..f97b0c625 100644 --- a/src/src/transports/smtp.c +++ b/src/src/transports/smtp.c @@ -4800,7 +4800,11 @@ if (sx->send_quit || tcw_done && !tcw) # ifdef EXIM_TCP_CORK (void) setsockopt(sx->cctx.sock, IPPROTO_TCP, EXIM_TCP_CORK, US &on, sizeof(on)); # endif - tls_close(sx->cctx.tls_ctx, TLS_SHUTDOWN_WAIT); + if (sx->send_tlsclose) + { + tls_close(sx->cctx.tls_ctx, TLS_SHUTDOWN_WAIT); + sx->send_tlsclose = FALSE; + } sx->cctx.tls_ctx = NULL; } #endif Exim has been running for about a week using this patch and I haven't experienced any issues. I don’t know if that is the correct fix or if there is a better way. But I hope it helps in figuring out the root cause of the issue. Thanks, Michael Tratz -- ## List details at https://lists.exim.org/mailman/listinfo/exim-users ## Exim details at http://www.exim.org/ ## Please use the Wiki with this list - http://wiki.exim.org/
